- Привет, команда
Я уже неделю как пытаюсь прочитать популярные теги из Twitter в моем приложении Spark.
Моя среда такая, как показано ниже. настройка от клиента нашей организации
Cdh 5.14
Spark 1.6
Scala 2.10
Ниже приведена ошибка, которую я получаю, вставляя полный журнал для справки.
и я пытался решить эту проблему ниже, но безрезультатно.
[main] INFO com.unraveldata.agent.ResourceCollector - Unravel Sensor
4.5.1.1rc0013/1.3.11.3 initializing. 20/01/25 18:12:09 INFO DriverProbe: Spark Live Updates Disabled: true 20/01/25 18:12:09
INFO spark.SparkContext: Running Spark version 1.6.0 20/01/25
18:12:10 INFO spark.SecurityManager: Changing view acls to: ZB609239
20/01/25 18:12:10 INFO spark.SecurityManager: Changing modify acls
to: ZB609239 20/01/25 18:12:10 INFO spark.SecurityManager:
SecurityManager: authentication disabled; ui acls disabled; users
with view permissions: Set(ZB609239); users with modify permissions:
Set(ZB609239) 20/01/25 18:12:10 INFO util.Utils: Successfully
started service 'sparkDriver' on port 52545. 20/01/25 18:12:10 INFO
slf4j.Slf4jLogger: Slf4jLogger started 20/01/25 18:12:10 INFO
Remoting: Starting remoting 20/01/25 18:12:11 INFO Remoting:
Remoting started; listening on addresses
:[akka.tcp://sparkDriverActorSystem@10.9.65.243:50540] 20/01/25
18:12:11 INFO Remoting: Remoting now listens on addresses:
[akka.tcp://sparkDriverActorSystem@10.9.65.243:50540] 20/01/25
18:12:11 INFO util.Utils: Successfully started service
'sparkDriverActorSystem' on port 50540. 20/01/25 18:12:11 INFO
spark.SparkEnv: Registering MapOutputTracker 20/01/25 18:12:11 INFO
spark.SparkEnv: Registering BlockManagerMaster 20/01/25 18:12:11
INFO storage.DiskBlockManager: Created local directory at
/tmp/blockmgr-2e313029-5057-446c-b3b4-e3fbee2c7afc 20/01/25 18:12:11
INFO storage.MemoryStore: MemoryStore started with capacity 530.3 MB
20/01/25 18:12:11 INFO spark.SparkEnv: Registering
OutputCommitCoordinator 20/01/25 18:12:11 INFO util.Utils:
Successfully started service 'SparkUI' on port 33950. 20/01/25
18:12:11 INFO ui.SparkUI: Started SparkUI at
http://10.9.65.243:33950 20/01/25 18:12:11 INFO spark.SparkContext:
Added JAR /home/zb609239/twitter4j-core-4.0.4.jar at
spark://10.9.65.243:52545/jars/twitter4j-core-4.0.4.jar with
timestamp 1579975931440 20/01/25 18:12:11 INFO spark.SparkContext:
Added JAR /home/zb609239/twitter4j-stream-4.0.4.jar at
spark://10.9.65.243:52545/jars/twitter4j-stream-4.0.4.jar with
timestamp 1579975931441 20/01/25 18:12:11 INFO spark.SparkContext:
Added JAR spark-streaming-twitter_2.10-1.6.1.jar at
spark://10.9.65.243:52545/jars/spark-streaming-twitter_2.10-1.6.1.jar
with timestamp 1579975931441 20/01/25 18:12:12 INFO yarn.Client:
Requesting a new application from cluster with 70 NodeManagers
20/01/25 18:12:12 INFO yarn.Client: Verifying our application has
not requested more than the maximum memory capability of the cluster
(36864 MB per container) 20/01/25 18:12:12 INFO yarn.Client: Will
allocate AM container, with 896 MB memory including 384 MB overhead
20/01/25 18:12:12 INFO yarn.Client: Setting up container launch
context for our AM 20/01/25 18:12:12 INFO yarn.Client: Setting up
the launch environment for our AM container 20/01/25 18:12:12 INFO
yarn.Client: Preparing resources for our AM container 20/01/25
18:12:12 INFO yarn.YarnSparkHadoopUtil: getting token for:
hdfs://nameservice1/user/ZB609239/.sparkStaging/application_1579122798111_703147
20/01/25 18:12:13 INFO hdfs.DFSClient: Created token for ZB609239:
HDFS_DELEGATION_TOKEN owner=ZB609239@IUSER.IROOT.ADIDOM.COM,
renewer=yarn, realUser=, issueDate=1579975932976,
maxDate=1580580732976, sequenceNumber=60369611, masterKeyId=2639 on
ha-hdfs:nameservice1 20/01/25 18:12:14 INFO hive.metastore: Trying
to connect to metastore with URI
thrift://tplhc01c001.iuser.iroot.adidom.com:9083 20/01/25 18:12:14
INFO hive.metastore: Opened a connection to metastore, current
connections: 1 20/01/25 18:12:14 INFO hive.metastore: Connected to
metastore. 20/01/25 18:12:14 INFO metadata.Hive: Registering
function nvl com.techm.hive.utils.GenericUDFNVL 20/01/25 18:12:14
INFO metadata.Hive: Registering function row_number
com.techm.hive.utils.GenericUDFRank 20/01/25 18:12:14 INFO
metadata.Hive: Registering function sysdate
com.techm.hive.utils.UDFSysDate 20/01/25 18:12:14 INFO
metadata.Hive: Registering function sdate
com.techm.hive.utils.UDFSysDate 20/01/25 18:12:14 INFO
metadata.Hive: Registering function sysdte
com.techm.hive.utils.UDFSysDate 20/01/25 18:12:14 INFO
metadata.Hive: Registering function testfunction
com.techm.hive.utils.UDFSysDate 20/01/25 18:12:14 INFO
metadata.Hive: Registering function ups org.hue.udf.MyUpper 20/01/25
18:12:15 INFO hive.metastore: Closed a connection to metastore,
current connections: 0 20/01/25 18:12:15 INFO yarn.Client: Uploading
resource
file:/tmp/spark-424037da-7567-42d0-8d63-0cd23074b36b/__spark_conf__5410820092210479896.zip
-> hdfs://nameservice1/user/ZB609239/.sparkStaging/application_1579122798111_703147/__spark_conf__5410820092210479896.zip
20/01/25 18:12:15 INFO spark.SecurityManager: Changing view acls to:
ZB609239 20/01/25 18:12:15 INFO spark.SecurityManager: Changing
modify acls to: ZB609239 20/01/25 18:12:15 INFO
spark.SecurityManager: SecurityManager: authentication disabled; ui
acls disabled; users with view permissions: Set(ZB609239); users
with modify permissions: Set(ZB609239) 20/01/25 18:12:15 INFO
yarn.Client: Submitting application 703147 to ResourceManager
20/01/25 18:12:15 INFO impl.YarnClientImpl: Submitted application
application_1579122798111_703147 20/01/25 18:12:16 INFO yarn.Client:
Application report for application_1579122798111_703147 (state:
ACCEPTED) 20/01/25 18:12:16 INFO yarn.Client: client token: Token {
kind: YARN_CLIENT_TOKEN, service: } diagnostics: N/A
ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue:
root.NONP.HAASAAP0761_10696 start time: 1579975935463 final status:
UNDEFINED tracking URL:
http://tplhc01c007.iuser.iroot.adidom.com:8088/proxy/application_1579122798111_703147/
user: ZB609239 20/01/25 18:12:17 INFO yarn.Client: Application
report for application_1579122798111_703147 (state: ACCEPTED)
20/01/25 18:12:18 INFO yarn.Client: Application report for
application_1579122798111_703147 (state: ACCEPTED) 20/01/25 18:12:19
INFO yarn.Client: Application report for
application_1579122798111_703147 (state: ACCEPTED) 20/01/25 18:12:20
INFO yarn.Client: Application report for
application_1579122798111_703147 (state: ACCEPTED) 20/01/25 18:12:21
INFO yarn.Client: Application report for
application_1579122798111_703147 (state: ACCEPTED) 20/01/25 18:12:22
INFO yarn.Client: Application report for
application_1579122798111_703147 (state: ACCEPTED) 20/01/25 18:12:23
INFO yarn.Client: Application report for
application_1579122798111_703147 (state: ACCEPTED) 20/01/25 18:12:24
INFO yarn.Client: Application report for
application_1579122798111_703147 (state: ACCEPTED) 20/01/25 18:12:25
INFO yarn.Client: Application report for
application_1579122798111_703147 (state: ACCEPTED) 20/01/25 18:12:26
INFO yarn.Client: Application report for
application_1579122798111_703147 (state: ACCEPTED) 20/01/25 18:12:27
INFO yarn.Client: Application report for
application_1579122798111_703147 (state: ACCEPTED) 20/01/25 18:12:28
INFO yarn.Client: Application report for
application_1579122798111_703147 (state: ACCEPTED) 20/01/25 18:12:29
INFO yarn.Client: Application report for
application_1579122798111_703147 (state: ACCEPTED) 20/01/25 18:12:30
INFO yarn.Client: Application report for
application_1579122798111_703147 (state: ACCEPTED) 20/01/25 18:12:31
INFO yarn.Client: Application report for
application_1579122798111_703147 (state: ACCEPTED) 20/01/25 18:12:32
INFO yarn.Client: Application report for
application_1579122798111_703147 (state: ACCEPTED) 20/01/25 18:12:33
INFO yarn.Client: Application report for
application_1579122798111_703147 (state: ACCEPTED) 20/01/25 18:12:34
INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint:
ApplicationMaster registered as NettyRpcEndpointRef(null) 20/01/25
18:12:34 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter.
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter,
Map(PROXY_HOSTS ->
tplhc01c007.iuser.iroot.adidom.com,tplhc01c009.iuser.iroot.adidom.com,
PROXY_URI_BASES ->
http://tplhc01c007.iuser.iroot.adidom.com:8088/proxy/application_1579122798111_703147,http://tplhc01...),
/proxy/application_1579122798111_703147 20/01/25 18:12:34 INFO
ui.JettyUtils: Adding filter:
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter 20/01/25
18:12:34 INFO yarn.Client: Application report for
application_1579122798111_703147 (state: ACCEPTED) 20/01/25 18:12:35
INFO yarn.Client: Application report for
application_1579122798111_703147 (state: RUNNING) 20/01/25 18:12:35
INFO yarn.Client: client token: Token { kind: YARN_CLIENT_TOKEN,
service: } diagnostics: N/A ApplicationMaster host: 10.9.65.223
ApplicationMaster RPC port: 0 queue: root.NONP.HAASAAP0761_10696
start time: 1579975935463 final status: UNDEFINED tracking URL:
http://tplhc01c007.iuser.iroot.adidom.com:8088/proxy/application_1579122798111_703147/
user: ZB609239 20/01/25 18:12:35 INFO
cluster.YarnClientSchedulerBackend: Application
application_1579122798111_703147 has started running. 20/01/25
18:12:35 INFO util.Utils: Successfully started service
'org.apache.spark.network.netty.NettyBlockTransferService' on port
60197. 20/01/25 18:12:35 INFO netty.NettyBlockTransferService: Server created on 60197 20/01/25 18:12:35 INFO storage.BlockManager:
external shuffle service port = 7337 20/01/25 18:12:35 INFO
storage.BlockManagerMaster: Trying to register BlockManager 20/01/25
18:12:35 INFO storage.BlockManagerMasterEndpoint: Registering block
manager 10.9.65.243:60197 with 530.3 MB RAM, BlockManagerId(driver,
10.9.65.243, 60197) 20/01/25 18:12:35 INFO storage.BlockManagerMaster: Registered BlockManager 20/01/25
18:12:36 INFO scheduler.EventLoggingListener: Logging events to
hdfs://nameservice1/user/spark/applicationHistory/application_1579122798111_703147
20/01/25 18:12:36 INFO spark.SparkContext: Registered listener
com.cloudera.spark.lineage.ClouderaNavigatorListener 20/01/25
18:12:36 INFO spark.SparkContext: Registered listener
org.apache.spark.UnravelListener 20/01/25 18:12:41 INFO
cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for
scheduling beginning after waiting
maxRegisteredResourcesWaitingTime: 30000(ms) 20/01/25 18:12:59 WARN
scheduler.TaskSetManager: Lost task 0.0 in stage 2.0 (TID 70,
tplhc01d191.iuser.iroot.adidom.com, executor 2):
java.lang.NoSuchMethodError:
twitter4j.TwitterStream.addListener(Ltwitter4j/StreamListener;)V at
org.apache.spark.streaming.twitter.TwitterReceiver.onStart(TwitterInputDStream.scala:72)
at
org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:148)
at
org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:130)
at
org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:575)
at
org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:565)
at
org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2022)
at
org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2022)
at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89) at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
20/01/25 18:13:00 WARN scheduler.TaskSetManager: Lost task 0.0 in
stage 3.0 (TID 72, tplhc01d191.iuser.iroot.adidom.com, executor 1):
java.lang.NoSuchMethodError:
twitter4j.TwitterStream.addListener(Ltwitter4j/StreamListener;)V at
org.apache.spark.streaming.twitter.TwitterReceiver.onStart(TwitterInputDStream.scala:72)
at
org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:148)
at
org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:130)
at
org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:575)
at
org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:565)
at
org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2022)
at
org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2022)
at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89) at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
20/01/25 18:13:01 ERROR cluster.YarnScheduler: Lost executor 2 on
tplhc01d191.iuser.iroot.adidom.com: Container marked as failed:
container_e168_1579122798111_703147_01_000003 on host:
tplhc01d191.iuser.iroot.adidom.com. Exit status: 50. Diagnostics:
Exception from container-launch. Container id:
container_e168_1579122798111_703147_01_000003 Exit code: 50 Stack
trace: ExitCodeException exitCode=50: at
org.apache.hadoop.util.Shell.runCommand(Shell.java:604) at
org.apache.hadoop.util.Shell.run(Shell.java:507) at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:789)
at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:399)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Shell output: main : command provided 1 main : run as user is
ZB609239 main : requested yarn user is ZB609239 Writing to tmp file
/data/14/yarn/nm/nmPrivate/application_1579122798111_703147/container_e168_1579122798111_703147_01_000003/container_e168_1579122798111_703147_01_000003.pid.tmp
Writing to cgroup task files...
Container exited with a non-zero exit code 50
20/01/25 18:13:01 WARN scheduler.TaskSetManager: Lost task 0.0 in
stage 5.0 (TID 73, tplhc01d191.iuser.iroot.adidom.com, executor 2):
ExecutorLostFailure (executor 2 exited caused by one of the running
tasks) Reason: Container marked as failed:
container_e168_1579122798111_703147_01_000003 on host:
tplhc01d191.iuser.iroot.adidom.com. Exit status: 50. Diagnostics:
Exception from container-launch. Container id:
container_e168_1579122798111_703147_01_000003 Exit code: 50 Stack
trace: ExitCodeException exitCode=50: at
org.apache.hadoop.util.Shell.runCommand(Shell.java:604) at
org.apache.hadoop.util.Shell.run(Shell.java:507) at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:789)
at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:399)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Shell output: main : command provided 1 main : run as user is
ZB609239 main : requested yarn user is ZB609239 Writing to tmp file
/data/14/yarn/nm/nmPrivate/application_1579122798111_703147/container_e168_1579122798111_703147_01_000003/container_e168_1579122798111_703147_01_000003.pid.tmp
Writing to cgroup task files...
Container exited with a non-zero exit code 50
20/01/25 18:13:01 WARN
cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked
as failed: container_e168_1579122798111_703147_01_000003 on host:
tplhc01d191.iuser.iroot.adidom.com. Exit status: 50. Diagnostics:
Exception from container-launch. Container id:
container_e168_1579122798111_703147_01_000003 Exit code: 50 Stack
trace: ExitCodeException exitCode=50: at
org.apache.hadoop.util.Shell.runCommand(Shell.java:604) at
org.apache.hadoop.util.Shell.run(Shell.java:507) at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:789)
at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:399)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Shell output: main : command provided 1 main : run as user is
ZB609239 main : requested yarn user is ZB609239 Writing to tmp file
/data/14/yarn/nm/nmPrivate/application_1579122798111_703147/container_e168_1579122798111_703147_01_000003/container_e168_1579122798111_703147_01_000003.pid.tmp
Writing to cgroup task files...
Container exited with a non-zero exit code 50
20/01/25 18:13:02 ERROR cluster.YarnScheduler: Lost executor 1 on
tplhc01d191.iuser.iroot.adidom.com: Container marked as failed:
container_e168_1579122798111_703147_01_000002 on host:
tplhc01d191.iuser.iroot.adidom.com. Exit status: 50. Diagnostics:
Exception from container-launch. Container id:
container_e168_1579122798111_703147_01_000002 Exit code: 50 Stack
trace: ExitCodeException exitCode=50: at
org.apache.hadoop.util.Shell.runCommand(Shell.java:604) at
org.apache.hadoop.util.Shell.run(Shell.java:507) at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:789)
at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:399)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Shell output: main : command provided 1 main : run as user is
ZB609239 main : requested yarn user is ZB609239 Writing to tmp file
/data/18/yarn/nm/nmPrivate/application_1579122798111_703147/container_e168_1579122798111_703147_01_000002/container_e168_1579122798111_703147_01_000002.pid.tmp
Writing to cgroup task files...
Container exited with a non-zero exit code 50
20/01/25 18:13:02 WARN
cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked
as failed: container_e168_1579122798111_703147_01_000002 on host:
tplhc01d191.iuser.iroot.adidom.com. Exit status: 50. Diagnostics:
Exception from container-launch. Container id:
container_e168_1579122798111_703147_01_000002 Exit code: 50 Stack
trace: ExitCodeException exitCode=50: at
org.apache.hadoop.util.Shell.runCommand(Shell.java:604) at
org.apache.hadoop.util.Shell.run(Shell.java:507) at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:789)
at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:399)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Shell output: main : command provided 1 main : run as user is
ZB609239 main : requested yarn user is ZB609239 Writing to tmp file
/data/18/yarn/nm/nmPrivate/application_1579122798111_703147/container_e168_1579122798111_703147_01_000002/container_e168_1579122798111_703147_01_000002.pid.tmp
Writing to cgroup task files...
Container exited with a non-zero exit code 50
20/01/25 18:13:02 WARN scheduler.TaskSetManager: Lost task 1.0 in
stage 5.0 (TID 74, tplhc01d191.iuser.iroot.adidom.com, executor 1):
ExecutorLostFailure (executor 1 exited caused by one of the running
tasks) Reason: Container marked as failed:
container_e168_1579122798111_703147_01_000002 on host:
tplhc01d191.iuser.iroot.adidom.com. Exit status: 50. Diagnostics:
Exception from container-launch. Container id:
container_e168_1579122798111_703147_01_000002 Exit code: 50 Stack
trace: ExitCodeException exitCode=50: at
org.apache.hadoop.util.Shell.runCommand(Shell.java:604) at
org.apache.hadoop.util.Shell.run(Shell.java:507) at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:789)
at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:399)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Shell output: main : command provided 1 main : run as user is
ZB609239 main : requested yarn user is ZB609239 Writing to tmp file
/data/18/yarn/nm/nmPrivate/application_1579122798111_703147/container_e168_1579122798111_703147_01_000002/container_e168_1579122798111_703147_01_000002.pid.tmp
Writing to cgroup task files...
Container exited with a non-zero exit code 50
Моя команда при исполнении выглядит так:
spark-submit --queue NONP.HAASXXXX_XXX --conf spark.ui.port=0 --jars
"/home/XXXXX/spark-streaming-twitter_2.10-1.6.1.jar","/home/XXXXX/twitter4j-core-4.0.4.jar","/home/XXXXX/twitter4j-stream-4.0.4.jar"
--class com.sparkStreaming.TwitterTags TwitterTagsBT.jar XXXX XXXXXRR RRRREEXX WWEERRDD twittertagfilter --conf
spark.yarn.user.classpath.first=true --conf
spark.shuffle.memoryFraction=0.5 --conf
spark.yarn.executor.memoryOverhead=8192 --conf
spark.akka.timeout=300 --conf
spark.storage.blockManagerSlaveTimeoutMs=300000 --conf
spark.yarn.driver.memoryOverhead=2048 --executor-memory 25g
--driver-memory 4g --num-executors 45 --executor-cores 5 --conf spark.executor.extrajavaoptions="-XX:-UseGCOverheadLimit" --conf
spark.executor.userClassPathFirst=true --conf
spark.driver.userClassPathFirst=truels
Моя Код идет так, как показано ниже. Просто добавьте, я пробовал также с версиями twitter4j 3.0.6,3.0.3 и 4.0.7, для всех них я получаю ту же ошибку выше, и она не подключается. Пожалуйста, сообщите, что будет проблема здесь.
package com.sparkStreaming
import org.apache.spark.streaming._ import org.apache.spark.sql._
import org.apache.spark.sql.hive.orc._ import
org.apache.spark.sql.expressions.Window import
org.apache.spark.sql.functions._ import org.apache.spark.rdd.RDD
import org.apache.spark.sql.Row import org.apache.spark._ import
org.apache.spark.SparkContext._ import
java.nio.charset.CodingErrorAction import scala.io.Codec import
org.apache.spark.storage.StorageLevel import org.apache.spark._
import org.apache.spark.util.Utils import
org.apache.spark.sql.Dataset import org.apache.spark.sql.expressions
import org.apache.spark.sql.functions.{ concat, lit } import
org.apache.spark.sql.types._ import
org.apache.spark.sql.functions.udf import java.util.Calendar import
org.apache.hadoop.fs.FileSystem import org.apache.hadoop.fs.Path
import
org.apache.hadoop.security.UserGroupInformation.HadoopConfiguration
import org.apache.log4j._ import
org.apache.spark.streaming.twitter.TwitterUtils import
org.apache.spark.streaming.twitter._ import twitter4j.Status; import
twitter4j.auth.Authorization import
twitter4j.auth.OAuthAuthorization import
twitter4j.conf.Configuration import
twitter4j.conf.ConfigurationBuilder
object TwitterTags {
def main(args: Array[String]) { if (args.length < 3) {
System.err.println("Usage: TwitterSentiments <consumer key>
<consumer secret> " + "<access token> <access token secret>
[<filters>]") System.exit(1) } // Set logging level if log4j not
configured (override by adding log4j.properties to classpath) if
(!Logger.getRootLogger.getAllAppenders.hasMoreElements) {
Logger.getRootLogger.setLevel(Level.WARN) } val Array(consumerKey,
consumerSecret, accessToken, accessTokenSecret) = args.take(4) //val
filters = args.takeRight(args.length - 4) // Set the system
properties so that Twitter4j library used by Twitter stream // can
use them to generate OAuth credentials
System.setProperty("twitter4j.oauth.consumerKey", consumerKey)
System.setProperty("twitter4j.oauth.consumerSecret", consumerSecret)
System.setProperty("twitter4j.oauth.accessToken", accessToken)
System.setProperty("twitter4j.oauth.accessTokenSecret",
accessTokenSecret)
val sparkConf = new
SparkConf().setAppName("TwitterHashTagJoinSentiments").
setJars(Array("/home/zb609239/twitter4j-core-4.0.4.jar,/home/zb609239/twitter4j-stream-4.0.4.jar,spark-streaming-twitter_2.10-1.6.1.jar"))
val ssc = new StreamingContext(sparkConf, Seconds(2))
ssc.sparkContext.setLogLevel("WARN") val stream =
TwitterUtils.createStream(ssc, None, filters) val stream =
TwitterUtils.createStream(ssc,None) val hashtags =
stream.flatMap(status => status.getText.split("
").filter(_.startsWith("@")))
val topCounts60 = hashtags.map(hashTag => (hashTag,
1)).reduceByKeyAndWindow(_ + _, Seconds(86400)) .map { case (topic,
count) => (count, topic) } .transform(_.sortByKey(false))
topCounts60.foreachRDD(rdd => { val topList = rdd.take(10)
println("\nLatest topics in last 24 hours (%s
total):".format(rdd.count())) topList.foreach { case (count, tag) =>
println("%s (%s tweets)".format(tag, count)) } })
ssc.start() ssc.awaitTermination()
}
}