Почему pyspark завершается с ошибкой «Ошибка при создании экземпляра org.apache.spark.sql.internal.SessionStateBuilder '»? - PullRequest
0 голосов
/ 26 октября 2019

При попытке настроить Pyspark и запустить его на PyCharm (через Databricks с AWS) я получаю следующую ошибку:

Spark service enabled. To enable the Spark service on this cluster, go to
https://....cloud.databricks.com/?o=...#setting/clusters//#setting/clusters/.../configuration
and add the following to the cluster's Spark config:

spark.databricks.service.server.enabled true

Я установил для этого значение true. Я даже настроил новый кластер на Databricks с первоначальным значением true и все еще получаю ту же ошибку!

Полное сообщение об ошибке:

19/10/25 14:09:02 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
19/10/25 14:09:04 WARN MetricsSystem: Using default name SparkStatusTracker for source because neither spark.metrics.namespace nor spark.app.id is set.
Testing simple count
Traceback (most recent call last):
  File "/Users/.../anaconda3/envs/dbconnect/lib/python3.5/site-packages/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/Users/.../anaconda3/envs/dbconnect/lib/python3.5/site-packages/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o20.range.
: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.internal.SessionStateBuilder':
    at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1178)
    at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:170)
    at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:169)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:169)
    at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:166)
    at org.apache.spark.sql.Dataset.<init>(Dataset.scala:193)
    at org.apache.spark.sql.SparkSession.range(SparkSession.scala:609)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
    at py4j.Gateway.invoke(Gateway.java:295)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:251)
    at java.lang.Thread.run(Thread.java:748)
Caused by: com.databricks.service.SparkServiceConnectionException: It appears that the cluster you are trying to connect to (...) does not have the
Spark service enabled. To enable the Spark service on this cluster, go to
...#setting/clusters//#setting/clusters/.../configuration
and add the following to the cluster's Spark config:

spark.databricks.service.server.enabled true

    at com.databricks.service.SparkServiceRPCClient.doPost(SparkServiceRPCClient.scala:104)
    at com.databricks.service.SparkServiceRPCClient.executeRPC0(SparkServiceRPCClient.scala:66)
    at com.databricks.service.SparkServiceRPCClientStub.com$databricks$service$SparkServiceRPCClientStub$$executeRPC(SparkServiceRPCClientStub.scala:133)
    at com.databricks.service.SparkServiceRPCClientStub$$anonfun$pollStatuses$1.apply(SparkServiceRPCClientStub.scala:486)
    at com.databricks.service.SparkServiceRPCClientStub$$anonfun$pollStatuses$1.apply(SparkServiceRPCClientStub.scala:483)
    at com.databricks.spark.util.Log4jUsageLogger.recordOperation(UsageLogger.scala:172)
    at com.databricks.spark.util.UsageLogging$class.recordOperation(UsageLogger.scala:297)
    at com.databricks.service.SparkServiceRPCClientStub.recordOperation(SparkServiceRPCClientStub.scala:60)
    at com.databricks.service.SparkServiceRPCClientStub.pollStatuses(SparkServiceRPCClientStub.scala:483)
    at com.databricks.service.SparkServiceRPCClientStub.com$databricks$service$SparkServiceRPCClientStub$$pollAndUpdateStatuses0(SparkServiceRPCClientStub.scala:454)
    at com.databricks.service.SparkServiceRPCClientStub$$anonfun$pollAndUpdateStatuses$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SparkServiceRPCClientStub.scala:435)
    at com.databricks.service.SparkServiceRPCClientStub$$anonfun$pollAndUpdateStatuses$1$$anonfun$apply$mcV$sp$1.apply(SparkServiceRPCClientStub.scala:433)
    at com.databricks.service.SparkServiceRPCClientStub$$anonfun$pollAndUpdateStatuses$1$$anonfun$apply$mcV$sp$1.apply(SparkServiceRPCClientStub.scala:433)
    at com.databricks.service.SparkServiceRPCClientStub.com$databricks$service$SparkServiceRPCClientStub$$withPollLock(SparkServiceRPCClientStub.scala:445)
    at com.databricks.service.SparkServiceRPCClientStub$$anonfun$pollAndUpdateStatuses$1.apply$mcV$sp(SparkServiceRPCClientStub.scala:432)
    at com.databricks.service.SparkServiceRPCClientStub$$anonfun$pollAndUpdateStatuses$1.apply(SparkServiceRPCClientStub.scala:430)
    at com.databricks.service.SparkServiceRPCClientStub$$anonfun$pollAndUpdateStatuses$1.apply(SparkServiceRPCClientStub.scala:430)
    at com.databricks.spark.util.Log4jUsageLogger.recordOperation(UsageLogger.scala:172)
    at com.databricks.spark.util.UsageLogging$class.recordOperation(UsageLogger.scala:297)
    at com.databricks.service.SparkServiceRPCClientStub.recordOperation(SparkServiceRPCClientStub.scala:60)
    at com.databricks.service.SparkServiceRPCClientStub.pollAndUpdateStatuses(SparkServiceRPCClientStub.scala:430)
    at com.databricks.service.SparkServiceRPCClientStub$$anonfun$getServerHadoopConf$1.apply(SparkServiceRPCClientStub.scala:408)
    at com.databricks.service.SparkServiceRPCClientStub$$anonfun$getServerHadoopConf$1.apply(SparkServiceRPCClientStub.scala:407)
    at com.databricks.service.SparkServiceRPCClientStub.com$databricks$service$SparkServiceRPCClientStub$$withPollLock(SparkServiceRPCClientStub.scala:445)
    at com.databricks.service.SparkServiceRPCClientStub.getServerHadoopConf(SparkServiceRPCClientStub.scala:407)
    at com.databricks.service.SparkClient$.getServerHadoopConf(SparkClient.scala:245)
    at com.databricks.spark.util.SparkClientContext$.getServerHadoopConf(SparkClientContext.scala:222)
    at org.apache.spark.SparkContext$$anonfun$hadoopConfiguration$1.apply(SparkContext.scala:317)
    at org.apache.spark.SparkContext$$anonfun$hadoopConfiguration$1.apply(SparkContext.scala:312)
    at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
    at org.apache.spark.SparkContext.hadoopConfiguration(SparkContext.scala:311)
    at org.apache.spark.sql.internal.SharedState.<init>(SharedState.scala:67)
    at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:145)
    at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:145)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:145)
    at org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:144)
    at org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:291)
    at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1175)
    ... 18 more


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/.../Documents/School/Technical/Data Science/Spark - The Definitive Guide/Examples/Spark-The_Definitive_Guide/test.py", line 9, in <module>
    print(spark.range(100).count())
  File "/Users/.../anaconda3/envs/dbconnect/lib/python3.5/site-packages/pyspark/sql/session.py", line 337, in range
    jdf = self._jsparkSession.range(0, int(start), int(step), int(numPartitions))
  File "/Users/.../anaconda3/envs/dbconnect/lib/python3.5/site-packages/py4j/java_gateway.py", line 1257, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/Users/.../anaconda3/envs/dbconnect/lib/python3.5/site-packages/pyspark/sql/utils.py", line 79, in deco
    raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.IllegalArgumentException: "Error while instantiating 'org.apache.spark.sql.internal.SessionStateBuilder':"

Я установил переменную конфигурации spark.databricks.service.server.enabled в true, но все еще получает эту ошибку.

...