Кафки-брокеры с полностью записанными хранилищами
Я пытался выдать столько сообщений, сколько мог обработать брокер. С полностью записанными хранилищами (8 ГБ) все брокеры остановились и не могут снова включиться с этой ошибкой
журналы брокеров, пытающихся перезапустить
[2020-04-28 04:34:05,774] INFO [ThrottledChannelReaper-Request]: Stopped (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2020-04-28 04:34:05,774] INFO [ThrottledChannelReaper-Request]: Shutdown completed (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2020-04-28 04:34:05,842] INFO [KafkaServer id=1] shut down completed (kafka.server.KafkaServer)
[2020-04-28 04:34:05,844] INFO Shutting down SupportedServerStartable (io.confluent.support.metrics.SupportedServerStartable)
[2020-04-28 04:33:58,847] INFO [ThrottledChannelReaper-Produce]: Stopped (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2020-04-28 04:34:05,844] INFO Closing BaseMetricsReporter (io.confluent.support.metrics.BaseMetricsReporter)
[2020-04-28 04:34:05,844] INFO Waiting for metrics thread to exit (io.confluent.support.metrics.SupportedServerStartable)
[2020-04-28 04:34:05,844] INFO Shutting down KafkaServer (io.confluent.support.metrics.SupportedServerStartable)
[2020-04-28 04:34:05,845] INFO [KafkaServer id=1] shutting down (kafka.server.KafkaServer)
[2020-04-28 04:33:58,847] INFO [ThrottledChannelReaper-Request]: Shutting down (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2020-04-28 04:33:59,847] INFO [ThrottledChannelReaper-Request]: Shutdown completed (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2020-04-28 04:33:59,847] INFO [ThrottledChannelReaper-Request]: Stopped (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2020-04-28 04:33:59,854] INFO [KafkaServer id=0] shut down completed (kafka.server.KafkaServer)
[2020-04-28 04:33:59,942] INFO Shutting down SupportedServerStartable (io.confluent.support.metrics.SupportedServerStartable)
[2020-04-28 04:33:59,942] INFO Closing BaseMetricsReporter (io.confluent.support.metrics.BaseMetricsReporter)
[2020-04-28 04:33:59,942] INFO Waiting for metrics thread to exit (io.confluent.support.metrics.SupportedServerStartable)
[2020-04-28 04:33:59,942] INFO Shutting down KafkaServer (io.confluent.support.metrics.SupportedServerStartable)
[2020-04-28 04:33:59,942] INFO [KafkaServer id=0] shutting down (kafka.server.KafkaServer)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[2020-04-28 04:37:18,938] INFO [ReplicaAlterLogDirsManager on broker 2] Removed fetcher for partitions Set(__consumer_offsets-22, kafka-connect-offset-15, kafka-connect-offset-16, kafka-connect-offset-2, __consumer_offsets-30, kafka-connect-offset-7, kafka-connect-offset-13, __consumer_offsets-8, __consumer_offsets-21, kafka-connect-offset-8, kafka-connect-status-2, kafka-connect-offset-5, __consumer_offsets-4, __consumer_offsets-27, __consumer_offsets-7, __consumer_offsets-9, __consumer_offsets-46, kafka-connect-offset-19, __consumer_offsets-25, __consumer_offsets-35, __consumer_offsets-41, __consumer_offsets-33, __consumer_offsets-23, __consumer_offsets-49, kafka-connect-offset-20, kafka-connect-offset-3, __consumer_offsets-47, __consumer_offsets-16, __consumer_offsets-28, kafka-connect-config-0, kafka-connect-offset-9, kafka-connect-offset-17, __consumer_offsets-31, __consumer_offsets-36, kafka-connect-status-1, __consumer_offsets-42, __consumer_offsets-3, __consumer_offsets-18, __consumer_offsets-37, __consumer_offsets-15, __consumer_offsets-24, kafka-connect-offset-10, kafka-connect-offset-24, kafka-connect-status-4, __consumer_offsets-38, __consumer_offsets-17, __consumer_offsets-48, kafka-connect-offset-23, kafka-connect-offset-21, kafka-connect-offset-0, __consumer_offsets-19, __consumer_offsets-11, kafka-connect-status-0, __consumer_offsets-13, kafka-connect-offset-18, __consumer_offsets-2, __consumer_offsets-43, __consumer_offsets-6, __consumer_offsets-14, kafka-connect-offset-14, kafka-connect-offset-22, kafka-connect-offset-6, perf-test4-0, kafka-connect-status-3, kafka-connect-offset-11, kafka-connect-offset-12, __consumer_offsets-20, __consumer_offsets-0, kafka-connect-offset-4, __consumer_offsets-44, __consumer_offsets-39, kafka-connect-offset-1, __consumer_offsets-12, __consumer_offsets-45, __consumer_offsets-1, __consumer_offsets-5, __consumer_offsets-26, __consumer_offsets-29, __consumer_offsets-34, __consumer_offsets-10, __consumer_offsets-32, __consumer_offsets-40) (kafka.server.ReplicaAlterLogDirsManager)
[2020-04-28 04:37:18,990] INFO [ReplicaManager broker=2] Broker 2 stopped fetcher for partitions __consumer_offsets-22,kafka-connect-offset-15,kafka-connect-offset-16,kafka-connect-offset-2,__consumer_offsets-30,kafka-connect-offset-7,kafka-connect-offset-13,__consumer_offsets-8,__consumer_offsets-21,kafka-connect-offset-8,kafka-connect-status-2,kafka-connect-offset-5,__consumer_offsets-4,__consumer_offsets-27,__consumer_offsets-7,__consumer_offsets-9,__consumer_offsets-46,kafka-connect-offset-19,__consumer_offsets-25,__consumer_offsets-35,__consumer_offsets-41,__consumer_offsets-33,__consumer_offsets-23,__consumer_offsets-49,kafka-connect-offset-20,kafka-connect-offset-3,__consumer_offsets-47,__consumer_offsets-16,__consumer_offsets-28,kafka-connect-config-0,kafka-connect-offset-9,kafka-connect-offset-17,__consumer_offsets-31,__consumer_offsets-36,kafka-connect-status-1,__consumer_offsets-42,__consumer_offsets-3,__consumer_offsets-18,__consumer_offsets-37,__consumer_offsets-15,__consumer_offsets-24,kafka-connect-offset-10,kafka-connect-offset-24,kafka-connect-status-4,__consumer_offsets-38,__consumer_offsets-17,__consumer_offsets-48,kafka-connect-offset-23,kafka-connect-offset-21,kafka-connect-offset-0,__consumer_offsets-19,__consumer_offsets-11,kafka-connect-status-0,__consumer_offsets-13,kafka-connect-offset-18,__consumer_offsets-2,__consumer_offsets-43,__consumer_offsets-6,__consumer_offsets-14,kafka-connect-offset-14,kafka-connect-offset-22,kafka-connect-offset-6,perf-test4-0,kafka-connect-status-3,kafka-connect-offset-11,kafka-connect-offset-12,__consumer_offsets-20,__consumer_offsets-0,kafka-connect-offset-4,__consumer_offsets-44,__consumer_offsets-39,kafka-connect-offset-1,__consumer_offsets-12,__consumer_offsets-45,__consumer_offsets-1,__consumer_offsets-5,__consumer_offsets-26,__consumer_offsets-29,__consumer_offsets-34,__consumer_offsets-10,__consumer_offsets-32,__consumer_offsets-40 and stopped moving logs for partitions because they are in the failed log directory /opt/kafka/data/logs. (kafka.server.ReplicaManager)
[2020-04-28 04:37:18,990] INFO Stopping serving logs in dir /opt/kafka/data/logs (kafka.log.LogManager)
[2020-04-28 04:37:18,992] WARN [Producer clientId=producer-1] 1 partitions have leader brokers without a matching listener, including [__confluent.support.metrics-0] (org.apache.kafka.clients.NetworkClient)
[2020-04-28 04:37:18,996] ERROR Shutdown broker because all log dirs in /opt/kafka/data/logs have failed (kafka.log.LogManager)
Снимок мониторинга Prometheus
Я надеюсь предотвратить эту ситуацию до того, как у брокера появятся такие решения, как удаление предыдущих сообщений, чтобы освободить место или что-то еще. есть ли для этого лучшие практики?