Ошибка API Spark Twitter - PullRequest
       18

Ошибка API Spark Twitter

0 голосов
/ 25 января 2020
  1. Привет, команда

Я уже неделю как пытаюсь прочитать популярные теги из Twitter в моем приложении Spark.

Моя среда такая, как показано ниже. настройка от клиента нашей организации

Cdh 5.14

Spark 1.6

Scala 2.10

Ниже приведена ошибка, которую я получаю, вставляя полный журнал для справки.

и я пытался решить эту проблему ниже, но безрезультатно.



    [main] INFO com.unraveldata.agent.ResourceCollector - Unravel Sensor
    4.5.1.1rc0013/1.3.11.3 initializing. 20/01/25 18:12:09 INFO DriverProbe: Spark Live Updates Disabled: true 20/01/25 18:12:09
    INFO spark.SparkContext: Running Spark version 1.6.0 20/01/25
    18:12:10 INFO spark.SecurityManager: Changing view acls to: ZB609239
    20/01/25 18:12:10 INFO spark.SecurityManager: Changing modify acls
    to: ZB609239 20/01/25 18:12:10 INFO spark.SecurityManager:
    SecurityManager: authentication disabled; ui acls disabled; users
    with view permissions: Set(ZB609239); users with modify permissions:
    Set(ZB609239) 20/01/25 18:12:10 INFO util.Utils: Successfully
    started service 'sparkDriver' on port 52545. 20/01/25 18:12:10 INFO
    slf4j.Slf4jLogger: Slf4jLogger started 20/01/25 18:12:10 INFO
    Remoting: Starting remoting 20/01/25 18:12:11 INFO Remoting:
    Remoting started; listening on addresses
    :[akka.tcp://sparkDriverActorSystem@10.9.65.243:50540] 20/01/25
    18:12:11 INFO Remoting: Remoting now listens on addresses:
    [akka.tcp://sparkDriverActorSystem@10.9.65.243:50540] 20/01/25
    18:12:11 INFO util.Utils: Successfully started service
    'sparkDriverActorSystem' on port 50540. 20/01/25 18:12:11 INFO
    spark.SparkEnv: Registering MapOutputTracker 20/01/25 18:12:11 INFO
    spark.SparkEnv: Registering BlockManagerMaster 20/01/25 18:12:11
    INFO storage.DiskBlockManager: Created local directory at
    /tmp/blockmgr-2e313029-5057-446c-b3b4-e3fbee2c7afc 20/01/25 18:12:11
    INFO storage.MemoryStore: MemoryStore started with capacity 530.3 MB
    20/01/25 18:12:11 INFO spark.SparkEnv: Registering
    OutputCommitCoordinator 20/01/25 18:12:11 INFO util.Utils:
    Successfully started service 'SparkUI' on port 33950. 20/01/25
    18:12:11 INFO ui.SparkUI: Started SparkUI at
    http://10.9.65.243:33950 20/01/25 18:12:11 INFO spark.SparkContext:
    Added JAR /home/zb609239/twitter4j-core-4.0.4.jar at
    spark://10.9.65.243:52545/jars/twitter4j-core-4.0.4.jar with
    timestamp 1579975931440 20/01/25 18:12:11 INFO spark.SparkContext:
    Added JAR /home/zb609239/twitter4j-stream-4.0.4.jar at
    spark://10.9.65.243:52545/jars/twitter4j-stream-4.0.4.jar with
    timestamp 1579975931441 20/01/25 18:12:11 INFO spark.SparkContext:
    Added JAR spark-streaming-twitter_2.10-1.6.1.jar at
    spark://10.9.65.243:52545/jars/spark-streaming-twitter_2.10-1.6.1.jar
    with timestamp 1579975931441 20/01/25 18:12:12 INFO yarn.Client:
    Requesting a new application from cluster with 70 NodeManagers
    20/01/25 18:12:12 INFO yarn.Client: Verifying our application has
    not requested more than the maximum memory capability of the cluster
    (36864 MB per container) 20/01/25 18:12:12 INFO yarn.Client: Will
    allocate AM container, with 896 MB memory including 384 MB overhead
    20/01/25 18:12:12 INFO yarn.Client: Setting up container launch
    context for our AM 20/01/25 18:12:12 INFO yarn.Client: Setting up
    the launch environment for our AM container 20/01/25 18:12:12 INFO
    yarn.Client: Preparing resources for our AM container 20/01/25
    18:12:12 INFO yarn.YarnSparkHadoopUtil: getting token for:
    hdfs://nameservice1/user/ZB609239/.sparkStaging/application_1579122798111_703147
    20/01/25 18:12:13 INFO hdfs.DFSClient: Created token for ZB609239:
    HDFS_DELEGATION_TOKEN owner=ZB609239@IUSER.IROOT.ADIDOM.COM,
    renewer=yarn, realUser=, issueDate=1579975932976,
    maxDate=1580580732976, sequenceNumber=60369611, masterKeyId=2639 on
    ha-hdfs:nameservice1 20/01/25 18:12:14 INFO hive.metastore: Trying
    to connect to metastore with URI
    thrift://tplhc01c001.iuser.iroot.adidom.com:9083 20/01/25 18:12:14
    INFO hive.metastore: Opened a connection to metastore, current
    connections: 1 20/01/25 18:12:14 INFO hive.metastore: Connected to
    metastore. 20/01/25 18:12:14 INFO metadata.Hive: Registering
    function nvl com.techm.hive.utils.GenericUDFNVL 20/01/25 18:12:14
    INFO metadata.Hive: Registering function row_number
    com.techm.hive.utils.GenericUDFRank 20/01/25 18:12:14 INFO
    metadata.Hive: Registering function sysdate
    com.techm.hive.utils.UDFSysDate 20/01/25 18:12:14 INFO
    metadata.Hive: Registering function sdate
    com.techm.hive.utils.UDFSysDate 20/01/25 18:12:14 INFO
    metadata.Hive: Registering function sysdte
    com.techm.hive.utils.UDFSysDate 20/01/25 18:12:14 INFO
    metadata.Hive: Registering function testfunction
    com.techm.hive.utils.UDFSysDate 20/01/25 18:12:14 INFO
    metadata.Hive: Registering function ups org.hue.udf.MyUpper 20/01/25
    18:12:15 INFO hive.metastore: Closed a connection to metastore,
    current connections: 0 20/01/25 18:12:15 INFO yarn.Client: Uploading
    resource
    file:/tmp/spark-424037da-7567-42d0-8d63-0cd23074b36b/__spark_conf__5410820092210479896.zip
    -> hdfs://nameservice1/user/ZB609239/.sparkStaging/application_1579122798111_703147/__spark_conf__5410820092210479896.zip
    20/01/25 18:12:15 INFO spark.SecurityManager: Changing view acls to:
    ZB609239 20/01/25 18:12:15 INFO spark.SecurityManager: Changing
    modify acls to: ZB609239 20/01/25 18:12:15 INFO
    spark.SecurityManager: SecurityManager: authentication disabled; ui
    acls disabled; users with view permissions: Set(ZB609239); users
    with modify permissions: Set(ZB609239) 20/01/25 18:12:15 INFO
    yarn.Client: Submitting application 703147 to ResourceManager
    20/01/25 18:12:15 INFO impl.YarnClientImpl: Submitted application
    application_1579122798111_703147 20/01/25 18:12:16 INFO yarn.Client:
    Application report for application_1579122798111_703147 (state:
    ACCEPTED) 20/01/25 18:12:16 INFO yarn.Client: client token: Token {
    kind: YARN_CLIENT_TOKEN, service: } diagnostics: N/A
    ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue:
    root.NONP.HAASAAP0761_10696 start time: 1579975935463 final status:
    UNDEFINED tracking URL:
    http://tplhc01c007.iuser.iroot.adidom.com:8088/proxy/application_1579122798111_703147/
    user: ZB609239 20/01/25 18:12:17 INFO yarn.Client: Application
    report for application_1579122798111_703147 (state: ACCEPTED)
    20/01/25 18:12:18 INFO yarn.Client: Application report for
    application_1579122798111_703147 (state: ACCEPTED) 20/01/25 18:12:19
    INFO yarn.Client: Application report for
    application_1579122798111_703147 (state: ACCEPTED) 20/01/25 18:12:20
    INFO yarn.Client: Application report for
    application_1579122798111_703147 (state: ACCEPTED) 20/01/25 18:12:21
    INFO yarn.Client: Application report for
    application_1579122798111_703147 (state: ACCEPTED) 20/01/25 18:12:22
    INFO yarn.Client: Application report for
    application_1579122798111_703147 (state: ACCEPTED) 20/01/25 18:12:23
    INFO yarn.Client: Application report for
    application_1579122798111_703147 (state: ACCEPTED) 20/01/25 18:12:24
    INFO yarn.Client: Application report for
    application_1579122798111_703147 (state: ACCEPTED) 20/01/25 18:12:25
    INFO yarn.Client: Application report for
    application_1579122798111_703147 (state: ACCEPTED) 20/01/25 18:12:26
    INFO yarn.Client: Application report for
    application_1579122798111_703147 (state: ACCEPTED) 20/01/25 18:12:27
    INFO yarn.Client: Application report for
    application_1579122798111_703147 (state: ACCEPTED) 20/01/25 18:12:28
    INFO yarn.Client: Application report for
    application_1579122798111_703147 (state: ACCEPTED) 20/01/25 18:12:29
    INFO yarn.Client: Application report for
    application_1579122798111_703147 (state: ACCEPTED) 20/01/25 18:12:30
    INFO yarn.Client: Application report for
    application_1579122798111_703147 (state: ACCEPTED) 20/01/25 18:12:31
    INFO yarn.Client: Application report for
    application_1579122798111_703147 (state: ACCEPTED) 20/01/25 18:12:32
    INFO yarn.Client: Application report for
    application_1579122798111_703147 (state: ACCEPTED) 20/01/25 18:12:33
    INFO yarn.Client: Application report for
    application_1579122798111_703147 (state: ACCEPTED) 20/01/25 18:12:34
    INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint:
    ApplicationMaster registered as NettyRpcEndpointRef(null) 20/01/25
    18:12:34 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter.
    org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter,
    Map(PROXY_HOSTS ->
    tplhc01c007.iuser.iroot.adidom.com,tplhc01c009.iuser.iroot.adidom.com,
    PROXY_URI_BASES ->
    http://tplhc01c007.iuser.iroot.adidom.com:8088/proxy/application_1579122798111_703147,http://tplhc01...),
    /proxy/application_1579122798111_703147 20/01/25 18:12:34 INFO
    ui.JettyUtils: Adding filter:
    org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter 20/01/25
    18:12:34 INFO yarn.Client: Application report for
    application_1579122798111_703147 (state: ACCEPTED) 20/01/25 18:12:35
    INFO yarn.Client: Application report for
    application_1579122798111_703147 (state: RUNNING) 20/01/25 18:12:35
    INFO yarn.Client: client token: Token { kind: YARN_CLIENT_TOKEN,
    service: } diagnostics: N/A ApplicationMaster host: 10.9.65.223
    ApplicationMaster RPC port: 0 queue: root.NONP.HAASAAP0761_10696
    start time: 1579975935463 final status: UNDEFINED tracking URL:
    http://tplhc01c007.iuser.iroot.adidom.com:8088/proxy/application_1579122798111_703147/
    user: ZB609239 20/01/25 18:12:35 INFO
    cluster.YarnClientSchedulerBackend: Application
    application_1579122798111_703147 has started running. 20/01/25
    18:12:35 INFO util.Utils: Successfully started service
    'org.apache.spark.network.netty.NettyBlockTransferService' on port
    60197. 20/01/25 18:12:35 INFO netty.NettyBlockTransferService: Server created on 60197 20/01/25 18:12:35 INFO storage.BlockManager:
    external shuffle service port = 7337 20/01/25 18:12:35 INFO
    storage.BlockManagerMaster: Trying to register BlockManager 20/01/25
    18:12:35 INFO storage.BlockManagerMasterEndpoint: Registering block
    manager 10.9.65.243:60197 with 530.3 MB RAM, BlockManagerId(driver,
    10.9.65.243, 60197) 20/01/25 18:12:35 INFO storage.BlockManagerMaster: Registered BlockManager 20/01/25
    18:12:36 INFO scheduler.EventLoggingListener: Logging events to
    hdfs://nameservice1/user/spark/applicationHistory/application_1579122798111_703147
    20/01/25 18:12:36 INFO spark.SparkContext: Registered listener
    com.cloudera.spark.lineage.ClouderaNavigatorListener 20/01/25
    18:12:36 INFO spark.SparkContext: Registered listener
    org.apache.spark.UnravelListener 20/01/25 18:12:41 INFO
    cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for
    scheduling beginning after waiting
    maxRegisteredResourcesWaitingTime: 30000(ms) 20/01/25 18:12:59 WARN
    scheduler.TaskSetManager: Lost task 0.0 in stage 2.0 (TID 70,
    tplhc01d191.iuser.iroot.adidom.com, executor 2):
    java.lang.NoSuchMethodError:
    twitter4j.TwitterStream.addListener(Ltwitter4j/StreamListener;)V at
    org.apache.spark.streaming.twitter.TwitterReceiver.onStart(TwitterInputDStream.scala:72)
    at
    org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:148)
    at
    org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:130)
    at
    org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:575)
    at
    org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:565)
    at
    org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2022)
    at
    org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2022)
    at
    org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89) at
    org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
    at
    java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)

    20/01/25 18:13:00 WARN scheduler.TaskSetManager: Lost task 0.0 in
    stage 3.0 (TID 72, tplhc01d191.iuser.iroot.adidom.com, executor 1):
    java.lang.NoSuchMethodError:
    twitter4j.TwitterStream.addListener(Ltwitter4j/StreamListener;)V at
    org.apache.spark.streaming.twitter.TwitterReceiver.onStart(TwitterInputDStream.scala:72)
    at
    org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:148)
    at
    org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:130)
    at
    org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:575)
    at
    org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:565)
    at
    org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2022)
    at
    org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2022)
    at
    org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89) at
    org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
    at
    java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)

    20/01/25 18:13:01 ERROR cluster.YarnScheduler: Lost executor 2 on
    tplhc01d191.iuser.iroot.adidom.com: Container marked as failed:
    container_e168_1579122798111_703147_01_000003 on host:
    tplhc01d191.iuser.iroot.adidom.com. Exit status: 50. Diagnostics:
    Exception from container-launch. Container id:
    container_e168_1579122798111_703147_01_000003 Exit code: 50 Stack
    trace: ExitCodeException exitCode=50: at
    org.apache.hadoop.util.Shell.runCommand(Shell.java:604) at
    org.apache.hadoop.util.Shell.run(Shell.java:507) at
    org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:789)
    at
    org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:399)
    at
    org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
    at
    org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
    java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)

    Shell output: main : command provided 1 main : run as user is
    ZB609239 main : requested yarn user is ZB609239 Writing to tmp file
    /data/14/yarn/nm/nmPrivate/application_1579122798111_703147/container_e168_1579122798111_703147_01_000003/container_e168_1579122798111_703147_01_000003.pid.tmp
    Writing to cgroup task files...


    Container exited with a non-zero exit code 50

    20/01/25 18:13:01 WARN scheduler.TaskSetManager: Lost task 0.0 in
    stage 5.0 (TID 73, tplhc01d191.iuser.iroot.adidom.com, executor 2):
    ExecutorLostFailure (executor 2 exited caused by one of the running
    tasks) Reason: Container marked as failed:
    container_e168_1579122798111_703147_01_000003 on host:
    tplhc01d191.iuser.iroot.adidom.com. Exit status: 50. Diagnostics:
    Exception from container-launch. Container id:
    container_e168_1579122798111_703147_01_000003 Exit code: 50 Stack
    trace: ExitCodeException exitCode=50: at
    org.apache.hadoop.util.Shell.runCommand(Shell.java:604) at
    org.apache.hadoop.util.Shell.run(Shell.java:507) at
    org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:789)
    at
    org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:399)
    at
    org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
    at
    org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
    java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)

    Shell output: main : command provided 1 main : run as user is
    ZB609239 main : requested yarn user is ZB609239 Writing to tmp file
    /data/14/yarn/nm/nmPrivate/application_1579122798111_703147/container_e168_1579122798111_703147_01_000003/container_e168_1579122798111_703147_01_000003.pid.tmp
    Writing to cgroup task files...


    Container exited with a non-zero exit code 50

    20/01/25 18:13:01 WARN
    cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked
    as failed: container_e168_1579122798111_703147_01_000003 on host:
    tplhc01d191.iuser.iroot.adidom.com. Exit status: 50. Diagnostics:
    Exception from container-launch. Container id:
    container_e168_1579122798111_703147_01_000003 Exit code: 50 Stack
    trace: ExitCodeException exitCode=50: at
    org.apache.hadoop.util.Shell.runCommand(Shell.java:604) at
    org.apache.hadoop.util.Shell.run(Shell.java:507) at
    org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:789)
    at
    org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:399)
    at
    org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
    at
    org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
    java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)

    Shell output: main : command provided 1 main : run as user is
    ZB609239 main : requested yarn user is ZB609239 Writing to tmp file
    /data/14/yarn/nm/nmPrivate/application_1579122798111_703147/container_e168_1579122798111_703147_01_000003/container_e168_1579122798111_703147_01_000003.pid.tmp
    Writing to cgroup task files...


    Container exited with a non-zero exit code 50

    20/01/25 18:13:02 ERROR cluster.YarnScheduler: Lost executor 1 on
    tplhc01d191.iuser.iroot.adidom.com: Container marked as failed:
    container_e168_1579122798111_703147_01_000002 on host:
    tplhc01d191.iuser.iroot.adidom.com. Exit status: 50. Diagnostics:
    Exception from container-launch. Container id:
    container_e168_1579122798111_703147_01_000002 Exit code: 50 Stack
    trace: ExitCodeException exitCode=50: at
    org.apache.hadoop.util.Shell.runCommand(Shell.java:604) at
    org.apache.hadoop.util.Shell.run(Shell.java:507) at
    org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:789)
    at
    org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:399)
    at
    org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
    at
    org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
    java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)

    Shell output: main : command provided 1 main : run as user is
    ZB609239 main : requested yarn user is ZB609239 Writing to tmp file
    /data/18/yarn/nm/nmPrivate/application_1579122798111_703147/container_e168_1579122798111_703147_01_000002/container_e168_1579122798111_703147_01_000002.pid.tmp
    Writing to cgroup task files...


    Container exited with a non-zero exit code 50

    20/01/25 18:13:02 WARN
    cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked
    as failed: container_e168_1579122798111_703147_01_000002 on host:
    tplhc01d191.iuser.iroot.adidom.com. Exit status: 50. Diagnostics:
    Exception from container-launch. Container id:
    container_e168_1579122798111_703147_01_000002 Exit code: 50 Stack
    trace: ExitCodeException exitCode=50: at
    org.apache.hadoop.util.Shell.runCommand(Shell.java:604) at
    org.apache.hadoop.util.Shell.run(Shell.java:507) at
    org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:789)
    at
    org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:399)
    at
    org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
    at
    org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
    java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)

    Shell output: main : command provided 1 main : run as user is
    ZB609239 main : requested yarn user is ZB609239 Writing to tmp file
    /data/18/yarn/nm/nmPrivate/application_1579122798111_703147/container_e168_1579122798111_703147_01_000002/container_e168_1579122798111_703147_01_000002.pid.tmp
    Writing to cgroup task files...


    Container exited with a non-zero exit code 50

    20/01/25 18:13:02 WARN scheduler.TaskSetManager: Lost task 1.0 in
    stage 5.0 (TID 74, tplhc01d191.iuser.iroot.adidom.com, executor 1):
    ExecutorLostFailure (executor 1 exited caused by one of the running
    tasks) Reason: Container marked as failed:
    container_e168_1579122798111_703147_01_000002 on host:
    tplhc01d191.iuser.iroot.adidom.com. Exit status: 50. Diagnostics:
    Exception from container-launch. Container id:
    container_e168_1579122798111_703147_01_000002 Exit code: 50 Stack
    trace: ExitCodeException exitCode=50: at
    org.apache.hadoop.util.Shell.runCommand(Shell.java:604) at
    org.apache.hadoop.util.Shell.run(Shell.java:507) at
    org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:789)
    at
    org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:399)
    at
    org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
    at
    org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
    java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)

    Shell output: main : command provided 1 main : run as user is
    ZB609239 main : requested yarn user is ZB609239 Writing to tmp file
    /data/18/yarn/nm/nmPrivate/application_1579122798111_703147/container_e168_1579122798111_703147_01_000002/container_e168_1579122798111_703147_01_000002.pid.tmp
    Writing to cgroup task files...


    Container exited with a non-zero exit code 50

Моя команда при исполнении выглядит так:

spark-submit --queue NONP.HAASXXXX_XXX --conf spark.ui.port=0 --jars
"/home/XXXXX/spark-streaming-twitter_2.10-1.6.1.jar","/home/XXXXX/twitter4j-core-4.0.4.jar","/home/XXXXX/twitter4j-stream-4.0.4.jar"
--class com.sparkStreaming.TwitterTags TwitterTagsBT.jar XXXX XXXXXRR RRRREEXX WWEERRDD twittertagfilter --conf
spark.yarn.user.classpath.first=true --conf
spark.shuffle.memoryFraction=0.5 --conf
spark.yarn.executor.memoryOverhead=8192 --conf
spark.akka.timeout=300 --conf
spark.storage.blockManagerSlaveTimeoutMs=300000 --conf
spark.yarn.driver.memoryOverhead=2048 --executor-memory 25g
--driver-memory 4g --num-executors 45 --executor-cores 5 --conf spark.executor.extrajavaoptions="-XX:-UseGCOverheadLimit" --conf
spark.executor.userClassPathFirst=true --conf
spark.driver.userClassPathFirst=truels

Моя Код идет так, как показано ниже. Просто добавьте, я пробовал также с версиями twitter4j 3.0.6,3.0.3 и 4.0.7, для всех них я получаю ту же ошибку выше, и она не подключается. Пожалуйста, сообщите, что будет проблема здесь.

package com.sparkStreaming
import org.apache.spark.streaming._ import org.apache.spark.sql._
import org.apache.spark.sql.hive.orc._ import
org.apache.spark.sql.expressions.Window import
org.apache.spark.sql.functions._ import org.apache.spark.rdd.RDD
import org.apache.spark.sql.Row import org.apache.spark._ import
org.apache.spark.SparkContext._ import
java.nio.charset.CodingErrorAction import scala.io.Codec import
org.apache.spark.storage.StorageLevel import org.apache.spark._
import org.apache.spark.util.Utils import
org.apache.spark.sql.Dataset import org.apache.spark.sql.expressions

import org.apache.spark.sql.functions.{ concat, lit } import
org.apache.spark.sql.types._ import
org.apache.spark.sql.functions.udf import java.util.Calendar import
org.apache.hadoop.fs.FileSystem import org.apache.hadoop.fs.Path
import
org.apache.hadoop.security.UserGroupInformation.HadoopConfiguration
import org.apache.log4j._ import
org.apache.spark.streaming.twitter.TwitterUtils import
org.apache.spark.streaming.twitter._ import twitter4j.Status; import
twitter4j.auth.Authorization import
twitter4j.auth.OAuthAuthorization import
twitter4j.conf.Configuration import
twitter4j.conf.ConfigurationBuilder

object TwitterTags {

def main(args: Array[String]) { if (args.length < 3) {
System.err.println("Usage: TwitterSentiments <consumer key>
<consumer secret> " + "<access token> <access token secret>
[<filters>]") System.exit(1) } // Set logging level if log4j not
configured (override by adding log4j.properties to classpath) if
(!Logger.getRootLogger.getAllAppenders.hasMoreElements) {
Logger.getRootLogger.setLevel(Level.WARN) } val Array(consumerKey,
consumerSecret, accessToken, accessTokenSecret) = args.take(4) //val
filters = args.takeRight(args.length - 4) // Set the system
properties so that Twitter4j library used by Twitter stream // can
use them to generate OAuth credentials
System.setProperty("twitter4j.oauth.consumerKey", consumerKey)
System.setProperty("twitter4j.oauth.consumerSecret", consumerSecret)
System.setProperty("twitter4j.oauth.accessToken", accessToken)
System.setProperty("twitter4j.oauth.accessTokenSecret",
accessTokenSecret)

val sparkConf = new
SparkConf().setAppName("TwitterHashTagJoinSentiments").
setJars(Array("/home/zb609239/twitter4j-core-4.0.4.jar,/home/zb609239/twitter4j-stream-4.0.4.jar,spark-streaming-twitter_2.10-1.6.1.jar"))

val ssc = new StreamingContext(sparkConf, Seconds(2))
ssc.sparkContext.setLogLevel("WARN") val stream =
TwitterUtils.createStream(ssc, None, filters) val stream =
TwitterUtils.createStream(ssc,None) val hashtags =
stream.flatMap(status => status.getText.split("
").filter(_.startsWith("@")))

val topCounts60 = hashtags.map(hashTag => (hashTag,
1)).reduceByKeyAndWindow(_ + _, Seconds(86400)) .map { case (topic,
count) => (count, topic) } .transform(_.sortByKey(false))

topCounts60.foreachRDD(rdd => { val topList = rdd.take(10)
println("\nLatest topics in last 24 hours (%s
total):".format(rdd.count())) topList.foreach { case (count, tag) =>
println("%s (%s tweets)".format(tag, count)) } })

ssc.start() ssc.awaitTermination()

}

} 

1 Ответ

0 голосов
/ 26 января 2020

Это зависит от того, как spark-streaming-twitter_2.10-1.6.1.jar был скомпилирован, но вы можете:

  • Spark скомпилирован с помпами Maven. Осмотрите pom.xml для этой банки, чтобы определить используемые зависимости. Например, https://github.com/cloudera/spark/blob/cdh5-base-1.6.0/external/twitter/pom.xml был скомпилирован с 4.0.4 версией twitter4j-stream артефакта.

  • Если вы используете такой инструмент, как SBT или Maven в В вашем проекте создайте дерево зависимостей (оно будет выглядеть как sbt "coursierDependencyTree" или mvn dependency:tree, но зависит от инструментов и плагинов вашей среды). Найдите несовместимое имя артефакта («twitter4j-stream») и посмотрите, какая у него версия. Обратите внимание, что транзитивная зависимость может быть переопределена, и в этом случае выходные данные инструмента сообщат вам.

  • Посмотрите на сигнатуры методов jar-файлов (или ваш собственный толстый jar с все зависимости), используя javap, чтобы подтвердить, что они соответствуют ожиданиям.

...