Я пытаюсь внедрить Kafka с zookeeper в моей сети, но столкнулся со странной проблемой с zookeeper. Я посмотрел вокруг Google и понял, что многие другие пользователи сообщали о такой проблеме, но никто не опубликовал никакого правильного решения для этого.
Моя текущая установка имеет 3 разных узла zookeeper (32 ГБ, выделенные ящики оперативной памяти)
Проблема в том, что, если я убью лидера зоопарка, оба оставшихся узла последователей также выйдут из строя, и они не восстановятся в течение по крайней мере следующих 15-20 минут.
Все, что я получаю в логах zookeeper, это "таймауты уведомлений" без объяснения причин
Вот мой конфигурационный файл zookeeper
tickTime=2000
initLimit=10
syncLimit=5
maxClientCnxns=100
maxSessionTimeout=50000
dataDir=/var/lib/zookeeper
clientPort=2181
autopurge.snapRetainCount=100
autopurge.purgeInterval=1
preAllocSize=131072
snapCount=3000000
server.1=zo1:2888:3888
server.2=zo2:2888:3888
server.3=zo3:2888:3888
в моем файле / etc / hosts я сопоставил zo1, zo2, zo3 с их IP-адресом.
Примечание: я также проверил, установив текущий узел ip на 0.0.0.0, это не имеет никакого значения.
всего несколько минут назад я проверил его и снова не удалось восстановить.
Поскольку у меня есть три узла кластера zo1, zo2 и zo3. zo3 был лидером, а zo1 и zo2 были последователями. после того, как я убил узел zo3. автоматическое восстановление заняло около 13 минут. я получил следующие журналы в zo1 и zo2.
Журнал для zo1.
tail /var/lib/zookeeper/zookeeper.out -n 10000 | grep 'QuorumPeer'
2019-01-02 10:25:50,848 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FollowerZooKeeperServer@140] - Shutting down
2019-01-02 10:25:50,848 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@505] - shutting down
2019-01-02 10:25:50,848 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FollowerRequestProcessor@107] - Shutting down
2019-01-02 10:25:50,848 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:CommitProcessor@184] - Shutting down
2019-01-02 10:25:50,848 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FinalRequestProcessor@402] - shutdown of request processor complete
2019-01-02 10:25:50,849 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:SyncRequestProcessor@208] - Shutting down
2019-01-02 10:25:50,849 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumPeer@865] - LOOKING
2019-01-02 10:25:50,850 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@818] - New election. My id = 1, proposed zxid=0x2d00035c8e
2019-01-02 10:25:51,057 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 400
2019-01-02 10:25:51,458 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 800
2019-01-02 10:25:52,259 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 1600
2019-01-02 10:25:53,859 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 3200
2019-01-02 10:25:57,060 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 6400
2019-01-02 10:26:03,461 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 12800
2019-01-02 10:26:16,262 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 25600
2019-01-02 10:26:41,862 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 51200
2019-01-02 10:27:33,063 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 60000
2019-01-02 10:28:33,065 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 60000
2019-01-02 10:29:33,066 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 60000
2019-01-02 10:30:33,066 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 60000
2019-01-02 10:31:33,067 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 60000
2019-01-02 10:32:33,068 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 60000
2019-01-02 10:33:33,069 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 60000
2019-01-02 10:34:33,069 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 60000
2019-01-02 10:35:33,070 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 60000
2019-01-02 10:36:33,071 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 60000
2019-01-02 10:37:33,071 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 60000
2019-01-02 10:38:33,072 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 60000
2019-01-02 10:39:33,073 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 60000
2019-01-02 10:40:33,074 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 60000
2019-01-02 10:41:33,075 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 60000
2019-01-02 10:42:33,076 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 60000
2019-01-02 10:43:33,076 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 60000
2019-01-02 10:43:33,082 [myid:1] - INFO [WorkerSender[myid=1]:QuorumPeer$QuorumServer@167] - Resolved hostname: zo3 to address: zo3/144.76.xxx.xxx
2019-01-02 10:43:33,091 [myid:1] - INFO [WorkerSender[myid=1]:QuorumPeer$QuorumServer@167] - Resolved hostname: zo3 to address: zo3/144.76.xxx.xxx
2019-01-02 10:43:33,290 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumPeer@935] - FOLLOWING
2019-01-02 10:43:33,290 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@173] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 50000 datadir /var/lib/zookeeper/version-2 snapdir /var/lib/zookeeper/version-2
2019-01-02 10:43:33,291 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@64] - FOLLOWING - LEADER ELECTION TOOK - 1062441
2019-01-02 10:43:33,291 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumPeer$QuorumServer@167] - Resolved hostname: zo2 to address: zo2/88.198.35.34
2019-01-02 10:43:33,294 [myid:1] - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Learner@237] - Unexpected exception, tries=0, connecting to zo2/88.198.35.34:2888
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:937)
2019-01-02 10:43:34,468 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Learner@332] - Getting a diff from the leader 0x2d00035c8e
2019-01-02 10:43:35,120 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@687] - Established session 0x2680a49e3dc0013 with negotiated timeout 6000 for client /5.9.xxx.xxx:36664
2019-01-02 10:43:35,244 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@687] - Established session 0x1680a49b6b90011 with negotiated timeout 30000 for client /5.9.xxx.xxx:36668
2019-01-02 10:43:35,625 [myid:1] - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@118] - Got zxid 0x2e00000001 expected 0x1
Журналы с узла zo2, который стал лидером позже
2019-01-02 10:25:50,852 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:NIOServerCnxn@1044] - Closed socket connection for client /5.9.xxx.xxx:21218 which had sessionid 0x2680a49e3dc0012
2019-01-02 10:25:50,852 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FollowerZooKeeperServer@140] - Shutting down
2019-01-02 10:25:50,853 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@505] - shutting down
2019-01-02 10:25:50,853 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FollowerRequestProcessor@107] - Shutting down
2019-01-02 10:25:50,854 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:CommitProcessor@184] - Shutting down
2019-01-02 10:25:50,854 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FinalRequestProcessor@402] - shutdown of request processor complete
2019-01-02 10:25:50,856 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:SyncRequestProcessor@208] - Shutting down
2019-01-02 10:25:50,857 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:QuorumPeer@865] - LOOKING
2019-01-02 10:25:50,858 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@818] - New election. My id = 2, proposed zxid=0x2d00035c8e
2019-01-02 10:25:51,061 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 400
2019-01-02 10:25:51,462 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 800
2019-01-02 10:25:52,283 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 1600
2019-01-02 10:25:53,884 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 3200
2019-01-02 10:25:57,084 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 6400
2019-01-02 10:26:03,485 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 12800
2019-01-02 10:26:16,286 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 25600
2019-01-02 10:26:41,887 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 51200
2019-01-02 10:27:33,087 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 60000
2019-01-02 10:28:33,088 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 60000
2019-01-02 10:29:33,089 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 60000
2019-01-02 10:30:33,090 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 60000
2019-01-02 10:31:33,091 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 60000
2019-01-02 10:32:33,092 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 60000
2019-01-02 10:33:33,092 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 60000
2019-01-02 10:34:33,093 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 60000
2019-01-02 10:35:33,094 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 60000
2019-01-02 10:36:33,095 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 60000
2019-01-02 10:37:33,095 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 60000
2019-01-02 10:38:33,096 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 60000
2019-01-02 10:39:33,097 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 60000
2019-01-02 10:40:33,098 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 60000
2019-01-02 10:41:33,099 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 60000
2019-01-02 10:42:33,100 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 60000
2019-01-02 10:43:33,293 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:QuorumPeer@947] - LEADING
2019-01-02 10:43:33,299 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:Leader@62] - TCP NoDelay set to: true
2019-01-02 10:43:33,301 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@173] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 50000 datadir /var/lib/zookeeper/version-2 snapdir /var/lib/zookeeper/version-2
2019-01-02 10:43:33,301 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:Leader@371] - LEADING - LEADER ELECTION TOOK - 1062443
2019-01-02 10:43:34,307 [myid:2] - INFO [LearnerHandler-/144.76.120.143:64542:LearnerHandler@346] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@33d2c290
2019-01-02 10:43:34,509 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:Leader@961] - Have quorum of supporters, sids: [ 1,2 ]; starting up and setting last processed zxid: 0x2e00000000
Как вы видите, все, что я получаю, это непрерывные тайм-ауты в журнале без объяснения причин.
Я тестировал его, так как более недели не могу найти никакого решения для этого.
Я был бы очень признателен, если бы кто-нибудь указал мне правильное направление.
Спасибо