Активный Наменод не работает в Hadoop - PullRequest
0 голосов
/ 27 августа 2018

В моем кластере hadoop у меня есть

1 активное имя узла

1 резервный узел имени

3 узла журнала

4 узла данных

До моего анализа, Active Namenode в down, потому что он не может писать правки для большинства узлов журнала. резервный узел имени не вступил во владение после сбоя активного Namenode, потому что не был разрешен доступ без пароля в Namenodes.

Вход в активный наменод

2018-07-22 00:49:05,496 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:49:07,490 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 6001 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [JOURNALNODE_IP:8485]

2018-07-22 00:49:08,491 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 7003 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [JOURNALNODE_IP:8485]

2018-07-22 00:49:08,500 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:49:09,493 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 8004 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [JOURNALNODE_IP:8485]

2018-07-22 00:49:10,493 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 9005 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [JOURNALNODE_IP:8485]

2018-07-22 00:49:11,495 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 10006 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [JOURNALNODE_IP:8485]

2018-07-22 00:49:11,506 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:49:12,495 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 11007 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [JOURNALNODE_IP:8485]

2018-07-22 00:49:13,496 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 12008 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [JOURNALNODE_IP:8485]

2018-07-22 00:49:14,498 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 13009 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [JOURNALNODE_IP:8485]

2018-07-22 00:49:14,512 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:49:15,498 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 14010 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [JOURNALNODE_IP:8485]

2018-07-22 00:49:16,500 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 15011 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [JOURNALNODE_IP:8485]

2018-07-22 00:49:17,500 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 16012 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [JOURNALNODE_IP:8485]

2018-07-22 00:49:17,518 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:49:18,502 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 17013 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [JOURNALNODE_IP:8485]

2018-07-22 00:49:19,503 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 18015 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [JOURNALNODE_IP:8485]

2018-07-22 00:49:20,504 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 19016 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [JOURNALNODE_IP:8485]

2018-07-22 00:49:20,524 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:49:21,489 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: flush failed for required journal (JournalAndStream(mgr=QJM to [JOURNALNODE_IP:8485, JOURNALNODE_IP:8485, JOURNALNODE_IP:8485], stream=QuorumOutputStream starting at txid 203478))

java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.

at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)

at org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)

at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)

at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)

at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:533)

at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393)

at org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:57)

at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:529)

at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:647)

at org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:1266)

at org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:1203)

at org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1300)

at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:5836)

at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1122)

at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:142)

at org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:12025)

at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)

at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)

at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)

2018-07-22 00:49:21,491 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Aborting QuorumOutputStream starting at txid 203478

2018-07-22 00:49:21,494 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1

2018-07-22 00:49:21,496 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:

/

SHUTDOWN_MSG: Shutting down NameNode at ACTIVE_NAMENODE_IP/ACTIVE_NAMENODE_IP

/

Вход в режим ожидания Наменода

2018-07-22 00:43:51,605 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:43:53,341 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:43:53,341 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:43:54,609 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:43:55,336 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 6001 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.

2018-07-22 00:43:56,336 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 7002 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.

2018-07-22 00:43:56,347 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:43:56,347 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:43:57,338 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 8003 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.

2018-07-22 00:43:57,615 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:43:58,339 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 9005 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.

2018-07-22 00:43:59,340 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 10006 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.

2018-07-22 00:43:59,353 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:43:59,353 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:44:00,342 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 11007 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.

2018-07-22 00:44:00,621 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:44:01,342 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 12008 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.

2018-07-22 00:44:02,343 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 13009 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.

Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:44:01,342 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 12008 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.

2018-07-22 00:44:02,343 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 13009 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.

2018-07-22 00:44:02,359 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:44:02,359 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:44:03,345 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 14010 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.

2018-07-22 00:44:03,627 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:44:04,345 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 15011 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.

2018-07-22 00:44:05,347 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 16012 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.

2018-07-22 00:44:05,365 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:44:05,365 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:44:06,347 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 17013 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.

2018-07-22 00:44:06,633 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:44:07,348 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 18014 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.

2018-07-22 00:44:08,350 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 19015 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.

2018-07-22 00:44:08,371 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:44:08,371 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:44:09,336 WARN org.apache.hadoop.hdfs.server.namenode.FSEditLog: Unable to determine input streams from QJM to [JOURNALNODE_IP:8485, JOURNALNODE_IP:8485, JOURNALNODE_IP:8485]. Skipping.

java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.

at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)

at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams(QuorumJournalManager.java:471)

at org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams(JournalSet.java:278)

at org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1508)

at org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1532)

at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:214)

at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:331)

at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:284)

at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:301)

at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:415)

at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:297)

Вход в узел журнала

2018-07-22 02:43:04,209 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8485: readAndProcess from client ACTIVE_NAMENODE_IP threw exception [java.io.IOException: Connection reset by peer]

java.io.IOException: Connection reset by peer

at sun.nio.ch.FileDispatcherImpl.read0(Native Method)

at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)

at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)

at sun.nio.ch.IOUtil.read(IOUtil.java:197)

at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)

at org.apache.hadoop.ipc.Server.channelRead(Server.java:2603)

at org.apache.hadoop.ipc.Server.access$2800(Server.java:136)

at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1481)

at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:771)

at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:637)

at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:608)

2018-07-22 02:43:04,212 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8485: readAndProcess from client STANBY_NAMENODE_IP threw exception [java.io.IOException: Connection reset by peer]

java.io.IOException: Connection reset by peer

at sun.nio.ch.FileDispatcherImpl.read0(Native Method)

at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)

at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)

at sun.nio.ch.IOUtil.read(IOUtil.java:197)

at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)

at org.apache.hadoop.ipc.Server.channelRead(Server.java:2603)

at org.apache.hadoop.ipc.Server.access$2800(Server.java:136)

at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1481)

at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:771)

at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:637)

at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:608)

Примечание

Я изменил IP-адрес namenode (активный), namenode (резервный) и journalnode на ACTIVE_NAMENODE_IP, STANDBY_NAMENODE_IP и JOURNALNODE_IP соответственно в журналах.

Так в чем причина неудачи Наменода?

...