У нас есть 3-узловое кластерное развертывание Kafka с 5 темами и 6 разделами на тему. мы настроили коэффициент репликации = 3, мы видим очень странную проблему, заключающуюся в том, что количество файловых дескрипторов перешло через предел (что составляет 50 КБ для нашего приложения)
As per the lsof command and our analysis
1. there have 15K established connection from kafka producer/consumer towards broker and at the same time in thread dump we have observed thousands of entry for kafka 'admin-client-network-thread'
admin-client-network-thread" #224398 daemon prio=5 os_prio=0 tid=0x00007f12ca119800 nid=0x5363 runnable [0x00007f12c4db8000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked <0x00000005e0603238> (a sun.nio.ch.Util$3)
- locked <0x00000005e0603228> (a java.util.Collections$UnmodifiableSet)
- locked <0x00000005e0602f08> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at org.apache.kafka.common.network.Selector.select(Selector.java:672)
at org.apache.kafka.common.network.Selector.poll(Selector.java:396)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:460)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:238)
- locked <0x00000005e0602dc0> (a org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:214)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:205)
at kafka.admin.AdminClient$$anon$1.run(AdminClient.scala:61)
at java.lang.Thread.run(Thread.java:748)
2. As per the lsof output , We have observed 35K entry for pipe and event poll
java 5441 app 374r FIFO 0,9 0t0 22415240 pipe
java 5441 app 375w FIFO 0,9 0t0 22415240 pipe
java 5441 app 376u a_inode 0,10 0 6379 [eventpoll]
java 5441 app 377r FIFO 0,9 0t0 22473333 pipe
java 5441 app 378r FIFO 0,9 0t0 28054726 pipe
java 5441 app 379r FIFO 0,9 0t0 22415241 pipe
java 5441 app 380w FIFO 0,9 0t0 22415241 pipe
java 5441 app 381u a_inode 0,10 0 6379 [eventpoll]
java 5441 app 382w FIFO 0,9 0t0 22473333 pipe
java 5441 app 383u a_inode 0,10 0 6379 [eventpoll]
java 5441 app 384u a_inode 0,10 0 6379 [eventpoll]
java 5441 app 385r FIFO 0,9 0t0 40216087 pipe
java 5441 app 386r FIFO 0,9 0t0 22483470 pipe
Setup details :-
apache kafka client :- 1.0.1
Kafka version :- 1.0.1
Open JDK :- java-1.8.0-openjdk-1.8.0.222.b10-1
CentOS version :- CentOS Linux release 7.6.1810
Note:- After restarted VM file descriptor count was able to clear and come to normal count as 1000
then after few second file descriptor count started to increase and it will reach to 50K (limit) after
1-week in Idle scenarios.