У нас работает конфлюэнтная кафка (https://github.com/confluentinc/cp-helm-charts) на kubernetes (1.14.6). У нас время хранения журнала 30 минут, а хранилища 300 ГБ. У нас 4 брокера и коэффициент репликации 3. Пропускная способность около 65 МБ / с.. После запуска примерно через час мы наблюдаем ошибку ниже. У брокера Kafka есть 6 ГБ кучи.
[2019-09-27 12:32:05,278] WARN [ReplicaFetcher replicaId=2, leaderId=3, fetcherId=0] Error when sending leader epoch request for Map(RT-15-7 -> (currentLeaderEpoch=Optional[8], leaderEpoch=6), RT-17-0 -> (currentLeaderEpoch=Optional[5], leaderEpoch=3), RT-19-6 -> (currentLeaderEpoch=Optional[7], leaderEpoch=5), RT-02-0 -> (currentLeaderEpoch=Optional[5], leaderEpoch=3), RT-27-4 -> (currentLeaderEpoch=Optional[6], leaderEpoch=4), RT-22-3 -> (currentLeaderEpoch=Optional[7], leaderEpoch=5), RT-32-5 -> (currentLeaderEpoch=Optional[7], leaderEpoch=5), RT-42-4 -> (currentLeaderEpoch=Optional[7], leaderEpoch=5), RT-27-1 -> (currentLeaderEpoch=Optional[9], leaderEpoch=7), _confluent-controlcenter-5-2-0-1-MetricsAggregateStore-repartition-2 -> (currentLeaderEpoch=Optional[11], leaderEpoch=9), RT-21-6 -> (currentLeaderEpoch=Optional[8], leaderEpoch=6), _confluent-controlcenter-5-2-0-1-metrics-trigger-measurement-rekey-3 -> (currentLeaderEpoch=Optional[11], leaderEpoch=9), RT-30-6 -> (currentLeaderEpoch=Optional[7], leaderEpoch=5), RT-06-6 -> (currentLeaderEpoch=Optional[7], leaderEpoch=5), _confluent-controlcenter-5-2-0-1-expected-group-consumption-rekey-1 -> (currentLeaderEpoch=Optional[11], leaderEpoch=9), RT-17-1 -> (currentLeaderEpoch=Optional[7], leaderEpoch=5), _confluent-metrics-10 -> (currentLeaderEpoch=Optional[11], leaderEpoch=9), RT-21-0 -> (currentLeaderEpoch=Optional[7], leaderEpoch=5), _confluent-monitoring-9 -> (currentLeaderEpoch=Optional[11], leaderEpoch=9), RT-17-9 -> (currentLeaderEpoch=Optional[9], leaderEpoch=7), RT-02-9 -> (currentLeaderEpoch=Optional[9], leaderEpoch=7), RT-20-1 -> (currentLeaderEpoch=Optional[8], leaderEpoch=6), RT-30-0 -> (currentLeaderEpoch=Optional[7], leaderEpoch=5), RT-12-0 -> (currentLeaderEpoch=Optional[8], leaderEpoch=6), RT-32-9 -> (currentLeaderEpoch=Optional[6], leaderEpoch=4), RT-02-1 -> (currentLeaderEpoch=Optional[7], leaderEpoch=5), RT-06-0 -> (currentLeaderEpoch=Optional[7], leaderEpoch=5), _confluent-monitoring-3 -> (currentLeaderEpoch=Optional[11], leaderEpoch=9), _confluent-metrics-7 -> (currentLeaderEpoch=Optional[11], leaderEpoch=9), _confluent-controlcenter-5-2-0-1-MetricsAggregateStore-changelog-0 -> (currentLeaderEpoch=Optional[11], leaderEpoch=9), RT-27-5 -> (currentLeaderEpoch=Optional[9], leaderEpoch=7), RT-17-6 -> (currentLeaderEpoch=Optional[6], leaderEpoch=4), RT-32-3 -> (currentLeaderEpoch=Optional[7], leaderEpoch=5), RT-02-6 -> (currentLeaderEpoch=Optional[6], leaderEpoch=4), _confluent-monitoring-0 -> (currentLeaderEpoch=Optional[11], leaderEpoch=9), RT-27-2 -> (currentLeaderEpoch=Optional[6], leaderEpoch=4), _confluent-controlcenter-5-2-0-1-actual-group-consumption-rekey-2 -> (currentLeaderEpoch=Optional[11], leaderEpoch=9)) (kafka.server.ReplicaFetcherThread)
java.io.IOException: Connection to 3 was disconnected before the response was read
at org.apache.kafka.clients.NetworkClientUtils.sendAndReceive(NetworkClientUtils.java:100)
at kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:107)
at kafka.server.ReplicaFetcherThread.fetchEpochEndOffsets(ReplicaFetcherThread.scala:310)
at kafka.server.AbstractFetcherThread.truncateToEpochEndOffsets(AbstractFetcherThread.scala:208)
at kafka.server.AbstractFetcherThread.maybeTruncate(AbstractFetcherThread.scala:173)
at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:113)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:89)
Остальная конфигурация по умолчанию. Я не уверен, что вызывает этоzookeeper, чтобы закрыть сокетное соединение. Также я вижу, что все мои модули работоспособны. Пожалуйста, дайте мне знать, если нужно добавить больше информации. Оцените любые указатели для отладки.
Графана Дашбаорд с изображением брокеров и разделов Другие полезные показатели