Я установил Apache Hadoop 2.7.5
и Apache Spark 2.3.0
.
Когда я отправляю свою работу с --master local[*]
, она работает нормально.Но когда я запускаю --master yarn
, ошибка из веб-журналов говорит:
Error: Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster
Вот команда, которую я запускаю:
spark-submit --class com.spark.SparkTest --master yarn --deploy-mode cluster /root/Downloads/SimpleSpark-0.0.1-SNAPSHOT.jar
И консоль читает:
[root@localhost sbin]# spark-submit --class com.spark.SparkTest --master yarn --deploy-mode cluster /root/Downloads/SimpleSpark-0.0.1-SNAPSHOT.jar
2018-05-12 17:24:37 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-05-12 17:24:39 INFO RMProxy:98 - Connecting to ResourceManager at /0.0.0.0:8032
2018-05-12 17:24:40 INFO Client:54 - Requesting a new application from cluster with 1 NodeManagers
2018-05-12 17:24:40 INFO Client:54 - Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
2018-05-12 17:24:40 INFO Client:54 - Will allocate AM container, with 1408 MB memory including 384 MB overhead
2018-05-12 17:24:40 INFO Client:54 - Setting up container launch context for our AM
2018-05-12 17:24:40 INFO Client:54 - Setting up the launch environment for our AM container
2018-05-12 17:24:40 INFO Client:54 - Preparing resources for our AM container
2018-05-12 17:24:43 INFO Client:54 - Uploading resource file:/opt/spark-2.3.0/yarn/spark-2.3.0-yarn-shuffle.jar -> hdfs://localhost:9000/user/root/.sparkStaging/application_1526143826498_0001/spark-2.3.0-yarn-shuffle.jar
2018-05-12 17:24:45 INFO Client:54 - Uploading resource file:/root/Downloads/SimpleSpark-0.0.1-SNAPSHOT.jar -> hdfs://localhost:9000/user/root/.sparkStaging/application_1526143826498_0001/SimpleSpark-0.0.1-SNAPSHOT.jar
2018-05-12 17:24:45 WARN DFSClient:611 - Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1252)
at java.lang.Thread.join(Thread.java:1326)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:609)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:370)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:546)
2018-05-12 17:24:45 WARN Client:66 - Same name resource file:/opt/spark-2.3.0/yarn/spark-2.3.0-yarn-shuffle.jar added multiple times to distributed cache
2018-05-12 17:24:45 INFO Client:54 - Uploading resource file:/tmp/spark-6db13382-d02d-4e8a-b5bf-5aafd535ba1e/__spark_conf__789951835863303071.zip -> hdfs://localhost:9000/user/root/.sparkStaging/application_1526143826498_0001/__spark_conf__.zip
2018-05-12 17:24:46 WARN DFSClient:611 - Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1252)
at java.lang.Thread.join(Thread.java:1326)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:609)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:370)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:546)
2018-05-12 17:24:46 INFO SecurityManager:54 - Changing view acls to: root
2018-05-12 17:24:46 INFO SecurityManager:54 - Changing modify acls to: root
2018-05-12 17:24:46 INFO SecurityManager:54 - Changing view acls groups to:
2018-05-12 17:24:46 INFO SecurityManager:54 - Changing modify acls groups to:
2018-05-12 17:24:46 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
2018-05-12 17:24:46 INFO Client:54 - Submitting application application_1526143826498_0001 to ResourceManager
2018-05-12 17:24:46 INFO YarnClientImpl:273 - Submitted application application_1526143826498_0001
2018-05-12 17:24:47 INFO Client:54 - Application report for application_1526143826498_0001 (state: ACCEPTED)
2018-05-12 17:24:47 INFO Client:54 -
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1526145886541
final status: UNDEFINED
tracking URL: http://localhost.localdomain:8088/proxy/application_1526143826498_0001/
user: root
2018-05-12 17:24:48 INFO Client:54 - Application report for application_1526143826498_0001 (state: ACCEPTED)
2018-05-12 17:24:49 INFO Client:54 - Application report for application_1526143826498_0001 (state: ACCEPTED)
2018-05-12 17:24:50 INFO Client:54 - Application report for application_1526143826498_0001 (state: ACCEPTED)
2018-05-12 17:24:51 INFO Client:54 - Application report for application_1526143826498_0001 (state: ACCEPTED)
2018-05-12 17:24:52 INFO Client:54 - Application report for application_1526143826498_0001 (state: ACCEPTED)
2018-05-12 17:24:53 INFO Client:54 - Application report for application_1526143826498_0001 (state: ACCEPTED)
2018-05-12 17:24:54 INFO Client:54 - Application report for application_1526143826498_0001 (state: FAILED)
2018-05-12 17:24:54 INFO Client:54 -
client token: N/A
diagnostics: Application application_1526143826498_0001 failed 2 times due to AM Container for appattempt_1526143826498_0001_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://localhost.localdomain:8088/cluster/app/application_1526143826498_0001Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1526143826498_0001_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
at org.apache.hadoop.util.Shell.run(Shell.java:482)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1526145886541
final status: FAILED
tracking URL: http://localhost.localdomain:8088/cluster/app/application_1526143826498_0001
user: root
2018-05-12 17:24:54 INFO Client:54 - Deleted staging directory hdfs://localhost:9000/user/root/.sparkStaging/application_1526143826498_0001
Exception in thread "main" org.apache.spark.SparkException: Application application_1526143826498_0001 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1159)
at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1518)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:879)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2018-05-12 17:24:55 INFO ShutdownHookManager:54 - Shutdown hook called
2018-05-12 17:24:55 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-6db13382-d02d-4e8a-b5bf-5aafd535ba1e
2018-05-12 17:24:55 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-1218ca67-7fae-4c0b-b678-002963a1cf08
Диагностика гласит:
Application application_1526143826498_0001 failed 2 times due to AM Container for appattempt_1526143826498_0001_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://localhost.localdomain:8088/cluster/app/application_1526143826498_0001Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1526143826498_0001_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
at org.apache.hadoop.util.Shell.run(Shell.java:482)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
Когда я нажимаю на логи для деталей:
Error: Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster
Вот мой иск-defaults.conf:
spark.master spark://localhost.localdomain:7077
spark.eventLog.enabled true
spark.eventLog.dir hdfs://localhost.localdomain:8021/user/spark/logs
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.driver.memory 1g
spark.executor.memory 1g
spark.yarn.dist.jars /opt/spark-2.3.0/yarn/spark-2.3.0-yarn-shuffle.jar
spark.yarn.jars /opt/spark-2.3.0/yarn/spark-2.3.0-yarn-shuffle.jar
# spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
My spark-env.sh:
SPARK_MASTER_HOST=localhost.localdomain
SPARK_MASTER_PORT=7077
SPARK_LOCAL_IP=localhost.localdomain
SPARK_CONF_DIR=${SPARK_HOME}/conf
HADOOP_CONF_DIR=/opt/hadoop-2.7.5/etc/hadoop
YARN_CONF_DIR=/opt/hadoop-2.7.5/etc/hadoop
SPARK_EXECUTOR_CORES=2
SPARK_EXECUTOR_MEMORY=500M
SPARK_DRIVER_MEMORY=500M
И мой yarn-site.xml:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.application.classpath</name>
<value>
/opt/hadoop-2.7.5/etc/hadoop,
/opt/hadoop-2.7.5/*,
/opt/hadoop-2.7.5/lib/*,
/opt/hadoop-2.7.5/share/hadoop/common/*,
/opt/hadoop-2.7.5/share/hadoop/common/lib/*
/opt/hadoop-2.7.5/share/hadoop/hdfs/*,
/opt/hadoop-2.7.5/share/hadoop/hdfs/lib/*,
/opt/hadoop-2.7.5/share/hadoop/mapreduce/*,
/opt/hadoop-2.7.5/share/hadoop/mapreduce/lib/*,
/opt/hadoop-2.7.5/share/hadoop/tools/lib/*,
/opt/hadoop-2.7.5/share/hadoop/yarn/*,
/opt/hadoop-2.7.5/share/hadoop/yarn/lib/*
</value>
</property>
</configuration>
Я скопировал spark-yarn_2.11-2.3.0.jar
в /opt/hadoop-2.7.5/share/hadoop/yarn/*
.
Iпошло через несколько решений stackoverflow, где упоминалось о прохождении --conf "spark.driver.extraJavaOptions=-Diop.version=4.1.0.0"
, но это не сработало для моего случая.
В каком-то решении говорилось о пропущенных банках журналирования, но я не уверен, какой именно.Мне не хватает какой-либо конфигурации ??