Невозможно создать SparkSession в DSE 6.7 - PullRequest
0 голосов
/ 21 декабря 2018

Я обновил DSE 5.1 ​​до DSE 6.7.Мы используем аналитический узел с поддержкой spark для заданий spark.

Я создаю объект sparkSession с использованием:

SparkSession
  .builder()
  .appName("App Name")
  .config("spark.cassandra.connection.host", "host")
  .config("spark.cassandra.connection.port", "port")
  .config("spark.cassandra.auth.username", "username")
  .config("spark.cassandra.auth.password", "password"))
  .enableHiveSupport()
  .getOrCreate();

В DSE 5.1 ​​он работал нормально, но в DSE 6.7 выдает следующую ошибку

Exception: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder':
java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder':
    at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1062)
    at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:137)
    at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:136)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:136)
    at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:133)
    at org.apache.spark.sql.SparkSession.conf$lzycompute(SparkSession.scala:162)
    at org.apache.spark.sql.SparkSession.conf(SparkSession.scala:162)
    at org.apache.spark.sql.cassandra.package$CassandraSparkSessionFunctions$$anonfun$setCassandraConf$extension1$4.apply(package.scala:148)
    at org.apache.spark.sql.cassandra.package$CassandraSparkSessionFunctions$$anonfun$setCassandraConf$extension1$4.apply(package.scala:148)
    at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
    at scala.collection.immutable.Map$Map1.foreach(Map.scala:116)
    at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
    at org.apache.spark.sql.cassandra.package$CassandraSparkSessionFunctions$.setCassandraConf$extension1(package.scala:148)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.DseSparkSubmit$.org$apache$spark$deploy$DseSparkSubmit$$runMain(DseSparkSubmit.scala:737)
    at org.apache.spark.deploy.DseSparkSubmit$.doRunMain$1(DseSparkSubmit.scala:171)
    at org.apache.spark.deploy.DseSparkSubmit$.submit(DseSparkSubmit.scala:196)
    at org.apache.spark.deploy.DseSparkSubmit$.main(DseSparkSubmit.scala:110)
    at org.apache.spark.deploy.DseSparkSubmitBootstrapper$.main(DseSparkSubmitBootstrapper.scala:103)
    at org.apache.spark.deploy.DseSparkSubmitBootstrapper.main(DseSparkSubmitBootstrapper.scala)
Caused by: org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient;
    at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
    at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:194)
    at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:105)
    at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:93)
    at org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:39)
    at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog$lzycompute(HiveSessionStateBuilder.scala:54)
    at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:52)
    at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:35)
    at org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:289)
    at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1059)
    ... 26 common frames omitted
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:525)
    at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:191)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264)
    at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:362)
    at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:266)
    at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:66)
    at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:65)
    at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply$mcZ$sp(HiveExternalCatalog.scala:195)
    at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:195)
    at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:195)
    at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
    ... 35 common frames omitted
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
    at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523)
    at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:86)
    at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)
    at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
    at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005)
    at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024)
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503)
    ... 49 common frames omitted
Caused by: java.lang.reflect.InvocationTargetException: null
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521)
    ... 55 common frames omitted
Caused by: java.lang.NoSuchMethodError: com.datastax.spark.connector.cql.CassandraConnectorConf.fromSparkConf(Lorg/apache/spark/SparkConf;)Lcom/datastax/spark/connector/cql/CassandraConnectorConf;
    at com.datastax.bdp.hadoop.hive.metastore.SchemaManagerService.getCassandraConnector(SchemaManagerService.java:145)
    at com.datastax.bdp.hadoop.hive.metastore.MetaStorePersister.<init>(MetaStorePersister.java:71)
    at com.datastax.bdp.hadoop.hive.metastore.MetaStorePersister.getInstance(MetaStorePersister.java:59)
    at com.datastax.bdp.hadoop.hive.metastore.CassandraHiveMetaStore.setConf(CassandraHiveMetaStore.java:70)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
    at org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:57)
    at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:66)
    at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:593)
    at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:571)
    at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:620)
    at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461)
    at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:66)
    at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72)
    at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)
    at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:199)
    at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:74)
    ... 60 common frames omitted

Ниже приведены зависимости, которые я использую.

<dependencies>
    <dependency>
        <groupId>org.scala-lang</groupId>
        <artifactId>scala-library</artifactId>
        <version>2.11.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>2.3.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.11</artifactId>
        <version>2.3.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-hive_2.11</artifactId>
        <version>2.3.0</version>
    </dependency>
    <dependency>
        <groupId>com.datastax.spark</groupId>
        <artifactId>spark-cassandra-connector_2.11</artifactId>
        <version>2.3.0</version>
    </dependency>
    <dependency>
        <groupId>com.google.guava</groupId>
        <artifactId>guava</artifactId>
        <version>19.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.logging.log4j</groupId>
        <artifactId>log4j-api-scala_2.11</artifactId>
        <version>11.0</version>
    </dependency>
    <dependency>
        <groupId>org.json4s</groupId>
        <artifactId>json4s-native_2.11</artifactId>
        <version>3.6.1</version>
    </dependency>
    <dependency>
        <groupId>org.json4s</groupId>
        <artifactId>json4s-jackson_2.11</artifactId>
        <version>3.6.1</version>
    </dependency>
</dependencies>

1 Ответ

0 голосов
/ 21 декабря 2018

DSE Analytics основана на Spark 2.2, а не на Spark 2.3.0 в качестве используемого вами коннектора.А для DSE я рекомендую использовать зависимости DSE для конкретной версии, например:

 <dependency>
    <groupId>com.datastax.dse</groupId>
    <artifactId>dse-spark-dependencies</artifactId>
    <version>${dse.version}</version>
    <scope>provided</scope>
  </dependency>

Примеры сборки Spark уже обновлены до 6.7.0, поэтому вы можете получать зависимости оттуда: https://github.com/datastax/SparkBuildExamples/blob/master/scala/maven/dse/pom.xml

...