Apache Spark не подключается к мета-хранилищу Hive (база данных не найдена) - PullRequest
0 голосов
/ 16 марта 2020

У меня есть Java Spark-код, где я пытаюсь подключиться к базе данных Hive. Но он имеет только базу данных по умолчанию и дает мне NoSuchDatabaseException. Я попытался сделать следующее, чтобы установить метастор улей.

  1. Добавить Spark Conf в коде с Hive Metastore URI
  2. Добавить Spark Conf в искровой отправки
  3. Добавить улей- site. xml в папке ресурсов
  4. скопируйте hive-site. xml в spark conf (/etc/spark2/conf/hive-site.xml)

Кроме того, файл конфигурации куста, загруженный во время выполнения, такой же, как (/etc/hive/conf/hive-site.xml)

SparkConf sparkConf = new SparkConf();
sparkConf.setAppName("example");
JavaSparkContext sc = new JavaSparkContext(sparkConf);
final SparkSession spark = SparkSession
                .builder()
                .appName("Java Spark Hive Example")
                .config("hive.metastore.uris", "thrift://***:1234")
                .config("spark.sql.uris", "thrift://***:1234")
                .config("hive.metastore.warehouse.dir", "hdfs://***:1234/user/hive/warehouse/")
                .enableHiveSupport()
                .getOrCreate();
JavaRDD<sampleClass> rdd = sc.parallelize(sample);

Dataset<Row> df2 = spark.createDataFrame(rdd, sampleClass.class);

spark.sql("show databases").show();

Журналы отправки искры приведены ниже.

    spark-submit --class sampleClass \
> --master local --deploy-mode client --executor-memory 1g \
> --name sparkTest --conf "spark.app.id=SampleLoad" \
> --files /etc/spark/conf/hive-site.xml load-1.0-SNAPSHOT-all.jar
20/03/16 12:33:19 INFO SparkContext: Running Spark version 2.3.0.2.6.5.0-292
20/03/16 12:33:19 INFO SparkContext: Submitted application: SampleLoad
20/03/16 12:33:19 INFO SecurityManager: Changing view acls to: root,User
20/03/16 12:33:19 INFO SecurityManager: Changing modify acls to: root,User
20/03/16 12:33:19 INFO SecurityManager: Changing view acls groups to:
20/03/16 12:33:19 INFO SecurityManager: Changing modify acls groups to:
20/03/16 12:33:19 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root, User); groups with view
permissions: Set(); users  with modify permissions: Set(root, User); groups with modify permissions: Set()
20/03/16 12:33:19 INFO Utils: Successfully started service 'sparkDriver' on port 35746.
20/03/16 12:33:19 INFO SparkEnv: Registering MapOutputTracker
20/03/16 12:33:19 INFO SparkEnv: Registering BlockManagerMaster
20/03/16 12:33:19 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
20/03/16 12:33:19 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
20/03/16 12:33:19 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-b946b14f-a52d-4467-8028-503ed7ae93da
20/03/16 12:33:19 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
20/03/16 12:33:19 INFO SparkEnv: Registering OutputCommitCoordinator
20/03/16 12:33:19 INFO Utils: Successfully started service 'SparkUI' on port 4042.
20/03/16 12:33:19 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://sample:4042
20/03/16 12:33:19 INFO SparkContext: Added JAR file:/abc/xyz/load-1.0-SNAPSHOT-all.jar at spark://sample:35746/jars/load-1.0-SNAPSHOT-all.jar with timestamp 1584347599756
20/03/16 12:33:19 INFO SparkContext: Added file file:///etc/spark/conf/hive-site.xml at file:///etc/spark/conf/hive-site.xml with timestamp 1584347599776
20/03/16 12:33:19 INFO Utils: Copying /etc/spark/conf/hive-site.xml to /tmp/spark-914265c5-6115-4aca-8b85-2cd49a530fae/userFiles-aaca5153-ce38-489a-a020-c2477fddc66e/hi
ve-site.xml
20/03/16 12:33:19 INFO Executor: Starting executor ID driver on host localhost
20/03/16 12:33:19 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 45179.
20/03/16 12:33:19 INFO NettyBlockTransferService: Server created on sample:45179
20/03/16 12:33:19 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
20/03/16 12:33:19 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, sample, 45179, None)
20/03/16 12:33:19 INFO BlockManagerMasterEndpoint: Registering block manager sample:45179 with 366.3 MB RAM, BlockManagerId(driver, lhdpegde2u.enbduat.c
om, 45179, None)
20/03/16 12:33:19 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, sample, 45179, None)
20/03/16 12:33:19 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, sample, 45179, None)
20/03/16 12:33:20 INFO EventLoggingListener: Logging events to hdfs:/spark2-history/local-1584347599812
20/03/16 12:33:20 WARN SparkContext: Using an existing SparkContext; some configuration may not take effect.
20/03/16 12:33:20 INFO SharedState: loading hive config file: file:/etc/spark2/2.6.5.0-292/0/hive-site.xml
20/03/16 12:33:21 INFO SharedState: spark.sql.warehouse.dir is not set, but hive.metastore.warehouse.dir is set. Setting spark.sql.warehouse.dir to the value of hive.metastore.warehouse.dir ('/apps/hive/warehouse').
20/03/16 12:33:21 INFO SharedState: Warehouse path is '/apps/hive/warehouse'.
20/03/16 12:33:21 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
20/03/16 12:33:22 INFO CodeGenerator: Code generated in 184.728545 ms
20/03/16 12:33:23 INFO CodeGenerator: Code generated in 10.538159 ms
20/03/16 12:33:23 INFO CodeGenerator: Code generated in 8.809847 ms
+-------+----------------+--------------------+
|   name|     description|         locationUri|
+-------+----------------+--------------------+
|default|default database|/apps/hive/warehouse|
+-------+----------------+--------------------+

20/03/16 12:33:23 INFO CodeGenerator: Code generated in 7.13541 ms
20/03/16 12:33:23 INFO CodeGenerator: Code generated in 5.771691 ms
+------------+
|databaseName|
+------------+
|     default|
+------------+

Exception in thread "main" org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 'sample' not found;
        at org.apache.spark.sql.catalyst.catalog.SessionCatalog.org$apache$spark$sql$catalyst$catalog$SessionCatalog$$requireDbExists(SessionCatalog.scala:177)
        at org.apache.spark.sql.catalyst.catalog.SessionCatalog.setCurrentDatabase(SessionCatalog.scala:259)
        at org.apache.spark.sql.execution.command.SetDatabaseCommand.run(databases.scala:59)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
        at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)
        at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)
        at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3253)
        at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
        at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3252)
        at org.apache.spark.sql.Dataset.<init>(Dataset.scala:190)
        at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:75)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:638)
        at ProcessXML.main(ProcessXML.java:95)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:906)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
20/03/16 12:33:23 INFO SparkContext: Invoking stop() from shutdown hook
20/03/16 12:33:23 INFO SparkUI: Stopped Spark web UI at http://sample:4042
20/03/16 12:33:24 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
20/03/16 12:33:24 INFO MemoryStore: MemoryStore cleared
20/03/16 12:33:24 INFO BlockManager: BlockManager stopped
20/03/16 12:33:24 INFO BlockManagerMaster: BlockManagerMaster stopped
20/03/16 12:33:24 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
20/03/16 12:33:24 INFO SparkContext: Successfully stopped SparkContext
20/03/16 12:33:24 INFO ShutdownHookManager: Shutdown hook called
20/03/16 12:33:24 INFO ShutdownHookManager: Deleting directory /tmp/spark-37386c3b-855a-4e09-a372-e8d12a08eebc
20/03/16 12:33:24 INFO ShutdownHookManager: Deleting directory /tmp/spark-914265c5-6115-4aca-8b85-2cd49a530fae

Пожалуйста, дайте мне знать, что / где я ошибся.

Заранее спасибо,

Gowtham R

...