Ошибка при использовании pyspark: ClassNotFoundException: com.sun.jersey.api.client.config.ClientConfig - PullRequest
0 голосов
/ 19 февраля 2019

Я пытаюсь запустить скрипт pyspark удаленно в AWS EMR, следуя инструкциям , предоставленным AWS .Однако, когда я пытаюсь отправить свой сценарий, я получаю следующее исключение:

Traceback (most recent call last):
  File "/home/aco/src/test_remote_pyspark.py", line 19, in <module>
    spark = SparkSession.builder.config(conf=conf).getOrCreate()
  File "/home/aco/.local/share/virtualenvs/prototypes-dS3RdFhP/lib/python3.6/site-packages/pyspark/sql/session.py", line 173, in getOrCreate
    sc = SparkContext.getOrCreate(sparkConf)
  File "/home/aco/.local/share/virtualenvs/prototypes-dS3RdFhP/lib/python3.6/site-packages/pyspark/context.py", line 349, in getOrCreate
    SparkContext(conf=conf or SparkConf())
  File "/home/aco/.local/share/virtualenvs/prototypes-dS3RdFhP/lib/python3.6/site-packages/pyspark/context.py", line 118, in __init__
    conf, jsc, profiler_cls)
  File "/home/aco/.local/share/virtualenvs/prototypes-dS3RdFhP/lib/python3.6/site-packages/pyspark/context.py", line 180, in _do_init
    self._jsc = jsc or self._initialize_context(self._conf._jconf)
  File "/home/aco/.local/share/virtualenvs/prototypes-dS3RdFhP/lib/python3.6/site-packages/pyspark/context.py", line 288, in _initialize_context
    return self._jvm.JavaSparkContext(jconf)
  File "/home/aco/.local/share/virtualenvs/prototypes-dS3RdFhP/lib/python3.6/site-packages/py4j/java_gateway.py", line 1525, in __call__
    answer, self._gateway_client, None, self._fqn)
  File "/home/aco/.local/share/virtualenvs/prototypes-dS3RdFhP/lib/python3.6/site-packages/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.NoClassDefFoundError: com/sun/jersey/api/client/config/ClientConfig
    at org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:55)
    at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createTimelineClient(YarnClientImpl.java:181)
    at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:168)
    at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
    at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:160)
    at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
    at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:178)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:501)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:238)
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.ClassNotFoundException: com.sun.jersey.api.client.config.ClientConfig
    at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:583)
    at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
    ... 20 more

У меня SPARK_HOME установлен в каталог, где я распаковал Spark, и я вижу, что он содержит некоторые Jars:

$ echo $SPARK_HOME
/home/aco/Downloads/spark-2.4.0-bin-hadoop2.7
$ ls $SPARK_HOME/jars | head
activation-1.1.1.jar
aircompressor-0.10.jar
antlr-2.7.7.jar
antlr4-runtime-4.7.jar
antlr-runtime-3.4.jar
aopalliance-1.0.jar
aopalliance-repackaged-2.4.0-b34.jar
apacheds-i18n-2.0.0-M15.jar
apacheds-kerberos-codec-2.0.0-M15.jar
apache-log4j-extras-1.2.17.jar

У меня нет ни малейшего понятия, как это отладить.Есть идеи?

...