Импала написать файл зависания файла hdfs - PullRequest
0 голосов
/ 02 апреля 2020

я использовал ooz ie с оболочкой impala для данных ETL. (запрос импалы - вставьте перезапись)

Однако я нашел рабочий процесс, который работал дольше обычного и никогда не завершался.

Я проверил незаконченный запрос и обнаружил, что импала зависла при записи файл в hdfs.

(Этот запрос выполняется более 3 часов, и обычно он выполняется за 10 минут.)

Я думаю, что эта проблема возникла из-за того, что Impala Daemon не удалось после подтверждения записи файла в hdfs .

я не знаю, почему SocketChannelImpl.ensureReadOpen постоянно блокируется.

Можете ли вы дать совет, чтобы решить эту проблему?

Моя среда:

  • CDH 5.14.2 (имел oop 2.6.0, импала 2.11)
  • Kerberos Enable
  • java 1.8.0_121

Дамп потока демона Impala (удаление IP-адреса)

"ResponseProcessor for block BP-21905457-<my-ip-address>-1502174846412:blk_1162137978_122336977" #341266 
daemon prio=5 os_prio=0 tid=0x00007f2930f14000 nid=0x65b3c waiting for monitor entry 
[0x00007f289262c000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:255)
    - waiting to lock <0x00000002a4d00c60> (a java.lang.Object)
    at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:300)
    - locked <0x00000002a4d00d10> (a java.lang.Object)
    at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
    at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
    at java.io.FilterInputStream.read(FilterInputStream.java:133)
    at 
org.apache.hadoop.crypto.CryptoInputStream.readFromUnderlyingStream(CryptoInputStream.java:220)
    at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:200)
    at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:658)
    at java.io.FilterInputStream.read(FilterInputStream.java:83)
    at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2303)
    at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:235)
    at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:1093)

   Locked ownable synchronizer:
    - None

DataNode (первый объект данных) Дамп потока

"PacketResponder: BP-21905457-<ip>-1502174846412:blk_1162137978_122336977, type=HAS_DOWNSTREAM_IN_PIPELINE" #32844152 daemon prio=5 os_prio=0 tid=0x00007f2532f2c000 nid=0x65b3b runnable [0x00007f25a2002000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
    - locked <0x00000000d111e670> (a sun.nio.ch.Util$3)
    - locked <0x00000000d111e680> (a java.util.Collections$UnmodifiableSet)
    - locked <0x00000000d111e628> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
    at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335)
    at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
    at java.io.FilterInputStream.read(FilterInputStream.java:133)
    at org.apache.hadoop.crypto.CryptoInputStream.readFromUnderlyingStream(CryptoInputStream.java:220)
    at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:200)
    at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:658)
    at java.io.FilterInputStream.read(FilterInputStream.java:83)
    at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2303)
    at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:235)
    at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1291)
    at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
    - None

"DataXceiver for client DFSClient_NONMAPREDUCE_633544602_1 at /<ip>:39456 [Receiving block BP-21905457-<ip>-1502174846412:blk_1162137978_122336977]" #32844151 daemon prio=5 os_prio=0 tid=0x000000000a609800 nid=0x65b3a runnable [0x00007f25a2102000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
    - locked <0x00000000d11173b0> (a sun.nio.ch.Util$3)
    - locked <0x00000000d11173c0> (a java.util.Collections$UnmodifiableSet)
    - locked <0x00000000d1117368> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
    at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335)
    at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
    at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:198)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
    at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
    - locked <0x00000000e8de7b60> (a java.io.BufferedInputStream)
    at java.io.DataInputStream.read(DataInputStream.java:149)
    at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:201)
    at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
    at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
    at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
    at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:501)
    at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:901)
    at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:808)
    at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169)
    at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106)
    at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246)
    at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
    - None
  • выглядит нормально

Имеет oop Журнал NameNode (удаление пути hdfs)

2020-04-01 04:11:17,671 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /<data path>/_impala_insert_staging/e346b1f10713e4ba_c69b6dcd00000000/.e346b1f10713e4ba-c69b6dcd00000052_264212949_dir/service_terms_id=21/e346b1f10713e4ba-c69b6dcd00000052_1934381021_data.0.parq. 
BP-21905457-<ip>-1502174846412 
blk_1162137978_122336977{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1,  replicas=[
    ReplicaUnderConstruction[[DISK]DS-513944a8-a1ca-4272-a521-959f8ebd6c4d:NORMAL:<ip>:1004|RBW], 
    ReplicaUnderConstruction[[DISK]DS-b912b9ce-93df-4c0f-b25d-55be36dae3e8:NORMAL:<ip>:1004|RBW], 
    ReplicaUnderConstruction[[DISK]DS-b6db02ea-1bbf-4920-9754-79c68eafba7f:NORMAL:<ip>:1004|RBW]]
}

# cannot found blockMap update log

Журнал DataNode

# node 1
2020-04-01 04:11:17,673 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-21905457-<ip>-1502174846412:blk_1162137978_122336977 src: /<ip>:39456 dest: /<ip>:1004

2020-04-01 09:40:45,634 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Scheduling blk_1162137978_122336977 file /data2/dfs/dn/current/BP-21905457-<ip>-1502174846412/current/rbw/blk_1162137978 for deletion
2020-04-01 09:40:45,635 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted BP-21905457-<ip>-1502174846412 blk_1162137978_122336977 file /data2/dfs/dn/current/BP-21905457-<ip>-1502174846412/current/rbw/blk_1162137978

# node2,3 log - same node1 log

Журнал Impala Daemon

  • Импала не имеет журнал предупреждений / ошибок

Информация Netstat

  • recv-q растет

Скажите, пожалуйста, если вам нужно больше информации. Спасибо. :)

...