Spark SQL запрос `SHOW VIEWS IN` через Hive metastore завершается ошибкой с отсутствующими FUNCTIONS в IN - PullRequest
0 голосов
/ 18 марта 2020

Имейте Spark (2.4.4) с запущенным метастафом Hive. При обращении к нему через JDBC / ODB C с запросом типа

SHOW VIEWS IN space1

я получаю следующую ошибку:

[2020-03-18T10:54:57,722][DEBUG][HiveServer2-Background-Pool: Thread-203][org.apache.spark.sql.execution.SparkSqlParser][][] Parsing command: SHOW VIEWS IN `space1` 
[2020-03-18T10:54:57,733][ERROR][HiveServer2-Background-Pool: Thread-203][org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation][][] Error executing query, currentState RUNNING,  
org.apache.spark.sql.catalyst.parser.ParseException: 
missing 'FUNCTIONS' at 'IN'(line 1, pos 11)

== SQL ==
SHOW VIEWS IN `space1`
-----------^^^

    at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:241) ~[spark-catalyst_2.11-2.4.4.jar:2.4.4]
    at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:117) ~[spark-catalyst_2.11-2.4.4.jar:2.4.4]
    at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48) ~[spark-sql_2.11-2.4.4.jar:2.4.4]
    at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:69) ~[spark-catalyst_2.11-2.4.4.jar:2.4.4]
    at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642) ~[spark-sql_2.11-2.4.4.jar:2.4.4]
    at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:694) ~[spark-sql_2.11-2.4.4.jar:2.4.4]
    at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:232) [spark-hive-thriftserver_2.11-2.4.4.jar:2.4.4]
    at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:175) [spark-hive-thriftserver_2.11-2.4.4.jar:2.4.4]
    at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:171) [spark-hive-thriftserver_2.11-2.4.4.jar:2.4.4]
    at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_201]
    at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_201]
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844) [hadoop-common-2.8.5.jar:?]
    at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:185) [spark-hive-thriftserver_2.11-2.4.4.jar:2.4.4]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_201]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_201]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_201]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_201]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_201]
[2020-03-18T10:54:57,765][ERROR][HiveServer2-Background-Pool: Thread-203][org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation][][] Error running hive query:  
org.apache.hive.service.cli.HiveSQLException: org.apache.spark.sql.catalyst.parser.ParseException: 
missing 'FUNCTIONS' at 'IN'(line 1, pos 11)

== SQL ==
SHOW VIEWS IN `space1`
-----------^^^

    at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:269) ~[spark-hive-thriftserver_2.11-2.4.4.jar:2.4.4]
    at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:175) [spark-hive-thriftserver_2.11-2.4.4.jar:2.4.4]
    at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:171) [spark-hive-thriftserver_2.11-2.4.4.jar:2.4.4]
    at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_201]
    at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_201]
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844) [hadoop-common-2.8.5.jar:?]
    at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:185) [spark-hive-thriftserver_2.11-2.4.4.jar:2.4.4]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_201]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_201]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_201]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_201]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_201]

Например, я получаю его при подключении Таблицу для моего Spark, или я могу запустить запрос явно через JDB C подключенный SQL инструмент.

Есть идеи?

Обратите внимание, запрос похож

SELECT * FROM `employer` WHERE `Name` IN ('John','Alex');

завершается без проблем!

Также у кого-то еще была эта проблема раньше, но она не получила ответа: https://community.powerbi.com/t5/Desktop/Spark-connector-issue/td-p/952481

...