Потоковая передача Hadoop: почему происходит сбой hadoop mapReduce со сценарием оболочки, но сценарии работают в командной строке? - PullRequest
0 голосов
/ 02 октября 2018

Я пытаюсь запустить простой код Map Reduce, используя потоковое задание hadoop. Я получаю сообщение об ошибке для потокового задания

Журнал ошибок

STREAM: yarn.timeline-service.leveldb-timeline-store.start-time-read-cache-size=10000
 STREAM: yarn.timeline-service.leveldb-timeline-store.start-time-write-cache-size=10000
 STREAM: yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms=300000
 STREAM: yarn.timeline-service.store-class=org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore
 STREAM: yarn.timeline-service.ttl-enable=true
 STREAM: yarn.timeline-service.ttl-ms=604800000
 STREAM: yarn.timeline-service.webapp.address=${yarn.timeline-service.hostname}:8188
 STREAM: yarn.timeline-service.webapp.https.address=${yarn.timeline-service.hostname}:8190
 STREAM: zlib.compress.level=DEFAULT_COMPRESSION
 STREAM: ====
 STREAM: submitting to jobconf: local
 18/10/02 13:49:15 INFO hdfs.DFSClient: Created token for hadoop_nifi_svc: HDFS_DELEGATION_TOKEN owner=XXXXXXXXXXX, renewer=yarn, realUser=, issueDate=1538506155462, maxDate=1539110955462, sequenceNumber=313257, masterKeyId=436 on ha-hdfs:HDFS-Service1
 18/10/02 13:49:15 INFO security.TokenCache: Got dt for hdfs://xxxxxxx
 18/10/02 13:49:15 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm357
 18/10/02 13:49:15 INFO mapred.FileInputFormat: Total input paths to process : 1
 18/10/02 13:49:15 INFO mapreduce.JobSubmitter: number of splits:2
 18/10/02 13:49:16 INFO mapreduce.JobSubmitter: Submitting tokens for job: 
 18/10/02 13:49:16 INFO impl.YarnClientImpl: Submitted application application_1538446556016_0347
 18/10/02 13:49:16 INFO mapreduce.Job: The url to track the job: 
 18/10/02 13:49:16 INFO mapreduce.Job: Running job: job_1538446556016_0347
 18/10/02 13:49:23 INFO mapreduce.Job: Job job_1538446556016_0347 running in uber mode : false
 18/10/02 13:49:23 INFO mapreduce.Job:  map 0% reduce 0%
 18/10/02 13:49:28 INFO mapreduce.Job: Task Id : attempt_1538446556016_0347_m_000001_0, Status : FAILED
 Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 2
         at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
         at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
         at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
         at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:459)
         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
         at java.security.AccessController.doPrivileged(Native Method)
         at javax.security.auth.Subject.doAs(Subject.java:422)
         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

 18/10/02 13:49:28 INFO mapreduce.Job: Task Id : attempt_1538446556016_0347_m_000000_0, Status : FAILED
 Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 2
         at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
         at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
         at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
         at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:459)
         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
         at java.security.AccessController.doPrivileged(Native Method)
         at javax.security.auth.Subject.doAs(Subject.java:422)
         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

 18/10/02 13:49:34 INFO mapreduce.Job: Task Id : attempt_1538446556016_0347_m_000001_1, Status : FAILED
 Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 2
         at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
         at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
         at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
         at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:459)
         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
         at java.security.AccessController.doPrivileged(Native Method)
         at javax.security.auth.Subject.doAs(Subject.java:422)
         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

 18/10/02 13:49:34 INFO mapreduce.Job: Task Id : attempt_1538446556016_0347_m_000000_1, Status : FAILED
 Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 2
         at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
         at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
         at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
         at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:459)
         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
         at java.security.AccessController.doPrivileged(Native Method)
         at javax.security.auth.Subject.doAs(Subject.java:422)
         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

 18/10/02 13:49:38 INFO mapreduce.Job: Task Id : attempt_1538446556016_0347_m_000000_2, Status : FAILED
 Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 2
         at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
         at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
         at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
         at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:459)
         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
         at java.security.AccessController.doPrivileged(Native Method)
         at javax.security.auth.Subject.doAs(Subject.java:422)
         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

 18/10/02 13:49:38 INFO mapreduce.Job: Task Id : attempt_1538446556016_0347_m_000001_2, Status : FAILED
 Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 2
         at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
         at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
         at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
         at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:459)
         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
         at java.security.AccessController.doPrivileged(Native Method)
         at javax.security.auth.Subject.doAs(Subject.java:422)
         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

 18/10/02 13:49:45 INFO mapreduce.Job:  map 100% reduce 100%
 18/10/02 13:49:45 INFO mapreduce.Job: Job job_1538446556016_0347 failed with state FAILED due to: Task failed task_1538446556016_0347_m_000001
 Job failed as tasks failed. failedMaps:1 failedReduces:0

 18/10/02 13:49:45 INFO mapreduce.Job: Counters: 13
         Job Counters
                 Failed map tasks=7
                 Killed map tasks=1
                 Killed reduce tasks=68
                 Launched map tasks=8
                 Other local map tasks=8
                 Total time spent by all maps in occupied slots (ms)=242104
                 Total time spent by all reduces in occupied slots (ms)=0
                 Total time spent by all map tasks (ms)=30263
                 Total vcore-milliseconds taken by all map tasks=30263
                 Total megabyte-milliseconds taken by all map tasks=247914496
         Map-Reduce Framework
                 CPU time spent (ms)=0
                 Physical memory (bytes) snapshot=0
                 Virtual memory (bytes) snapshot=0
       18/10/02 13:49:45 ERROR streaming.StreamJob: Job not successful!

count.sh

#!/usr/bin/bash

while read line
do
 for word in $line do
 if [ -n $word ] then
        wcount=`echo $word | wc -m`;
        wlength=`expr $wcount - 1`;
        letter=`echo $word | head -c1`;
        echo -e "$lettert$wlength";
 fi
done
done

Проблема:

Если сценарии выполняются в командной строке Unix, они работают отлично и выдают ожидаемый результат.

Я пробовал различные способы объявлениявходной файл для команды hadoop, без разницы, без успеха.

Что я делаю не так?Советы, идеи очень ценятся спасибо

...