Question

Я новичок в области больших данных.

Hadoop 3.1.1
улей 3.1.1
spark 2.3.2

используйте brew для установки

после того, как я установил mysql как метастор, чем я установил

<name>hive.execution.engine</name>
<value>spark</value>

в hive-site.xml .

Все идеально при использовании Hive on mr .

В кусте в режиме искры я могу создавать дб, таблицы и выбирать, но когда я вставляю , я получаю ошибку.

➜  conf hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/Cellar/hive/3.1.1/libexec/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/Cellar/hadoop/3.1.1/libexec/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Hive Session ID = aa5104a5-cc5c-4081-8cb4-198d17b22e55

Logging initialized using configuration in jar:file:/usr/local/Cellar/hive/3.1.1/libexec/lib/hive-common-3.1.1.jar!/hive-log4j2.properties Async: true
Hive Session ID = 1f84a3d7-16f6-48dd-bbe7-bcd78982fa72
hive> use sparktest;
OK
Time taken: 1.054 seconds
hive> select * from student;
OK
1   Xueqian F   23
2   Weiliang    M   24
Time taken: 1.577 seconds, Fetched: 2 row(s)
hive> insert into student values(2,'Weiliang','M',25);
Query ID = wyx_20190224200349_87e1103b-dbfe-4761-aa6e-40bed1811399
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create Spark client for Spark session c3fb44fc-eada-4878-a86e-9b339787e207)'
FAILED: Execution Error, return code 30041 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Failed to create Spark client for Spark session c3fb44fc-eada-4878-a86e-9b339787e207
hive>

В hive-site.xml , я обнаружил, что hive.exec.reducers.bytes.per.reducer и т. Д. Установлено, почему оно должно устанавливаться снова?

230   <property>
 231     <name>hive.exec.reducers.bytes.per.reducer</name>
 232     <value>256000000</value>
 233     <description>size per reducer.The default is 256Mb, i.e if the input size is 1G, it will use 4 reducers.</description>
 234   </property>
 235   <property>
 236     <name>hive.exec.reducers.max</name>
 237     <value>1009</value>
 238     <description>
 239       max number of reducers will be used. If the one specified in the configuration parameter mapred.reduce.tasks is
 240       negative, Hive will use this one as the max number of reducers when automatically determine number of reducers.
 241     </description>
 242   </property>

Я думаю, что мой улей с поддержкой искры.

➜  ~ spark-shell
2019-02-24 20:29:53 WARN  Utils:66 - Your hostname, wuyuxideMacBook-Pro.local resolves to a loopback address: 127.0.0.1; using 192.168.1.100 instead (on interface en0)
2019-02-24 20:29:53 WARN  Utils:66 - Set SPARK_LOCAL_IP if you need to bind to another address
2019-02-24 20:29:54 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://192.168.1.100:4040
Spark context available as 'sc' (master = local[*], app id = local-1551011399409).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.3.2
      /_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_77)
Type in expressions to have them evaluated.
Type :help for more information.

scala> import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.hive.HiveContext

scala>

Я использую pyspark==2.4.0 для доступа к кусту при искре.

from pyspark import SparkContext, HiveContext


with SparkContext() as sc:
    hive_context = HiveContext(sc)
    hive_context.sql('SELECT * FROM sparktest.student').show()

Ошибка получения не найдена sparktest.student

2019-02-24 20:59:31 WARN  Utils:66 - Your hostname, wuyuxideMacBook-Pro.local resolves to a loopback address: 127.0.0.1; using 192.168.1.100 instead (on interface en0)
2019-02-24 20:59:31 WARN  Utils:66 - Set SPARK_LOCAL_IP if you need to bind to another address
2019-02-24 20:59:31 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
2019-02-24 20:59:36 WARN  ObjectStore:568 - Failed to get database global_temp, returning NoSuchObjectException
2019-02-24 20:59:36 WARN  ObjectStore:568 - Failed to get database sparktest, returning NoSuchObjectException
Traceback (most recent call last):
  File "/Users/wyx/project/py3.7aio/.env/lib/python3.6/site-packages/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/Users/wyx/project/py3.7aio/.env/lib/python3.6/site-packages/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o23.sql.
: org.apache.spark.sql.AnalysisException: Table or view not found: `sparktest`.`student`; line 1 pos 14;
'Project [*]
+- 'UnresolvedRelation `sparktest`.`student`

    at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
    at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:90)
    at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:85)
    at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
    at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:126)
    at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:126)
    at scala.collection.immutable.List.foreach(List.scala:392)
    at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:126)
    at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:85)
    at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:95)
    at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:108)
    at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:105)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:201)
    at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:105)
    at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
    at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
    at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
    at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79)
    at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:745)


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/wyx/project/py3.7aio/spark/hive.py", line 13, in <module>
    hive_context.sql('SELECT * FROM sparktest.student').show()
  File "/Users/wyx/project/py3.7aio/.env/lib/python3.6/site-packages/pyspark/sql/context.py", line 358, in sql
    return self.sparkSession.sql(sqlQuery)
  File "/Users/wyx/project/py3.7aio/.env/lib/python3.6/site-packages/pyspark/sql/session.py", line 767, in sql
    return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
  File "/Users/wyx/project/py3.7aio/.env/lib/python3.6/site-packages/py4j/java_gateway.py", line 1257, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/Users/wyx/project/py3.7aio/.env/lib/python3.6/site-packages/pyspark/sql/utils.py", line 69, in deco
    raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: "Table or view not found: `sparktest`.`student`; line 1 pos 14;\n'Project [*]\n+- 'UnresolvedRelation `sparktest`.`student`\n"

Process finished with exit code 1

улей на искре для осса

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

Ответы [ 0 ]

улей на искре для осса

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

Ответы [ 0 ]

Похожие темы