Сериализация таблицы для вложенных JSON с использованием Apache Spark SQL - PullRequest
0 голосов
/ 27 февраля 2020

Вопрос такой же, как этот , но можно ли это:

df.withColumn("VEHICLE",struct("VEHICLENUMBER","CUSTOMERID")).
  select("VEHICLE","ACCOUNTNO"). //only select reqired columns
  groupBy("ACCOUNTNO"). 
  agg(collect_list("VEHICLE").as("VEHICLE")). //for the same group create a list of vehicles
  toJSON. //convert to json
  show(false)

переписать чистым SQL? Я имею в виду нечто подобное:

val sqlDF = spark.sql("SELECT VEHICLE, ACCOUNTNO as collect_list(ACCOUNTNO) FROM VEHICLES group by ACCOUNTNO)
sqlDF.show()

Возможно ли это?

1 Ответ

0 голосов
/ 27 февраля 2020

Эквивалент SQL вашего примера с данными будет:

scala> val df = Seq((10003014,"MH43AJ411",20000000),
     |   (10003014,"MH43AJ411",20000001),
     |   (10003015,"MH12GZ3392",20000002)
     | ).toDF("ACCOUNTNO","VEHICLENUMBER","CUSTOMERID").withColumn("VEHICLE",struct("VEHICLENUMBER","CUSTOMERID"))
df: org.apache.spark.sql.DataFrame = [ACCOUNTNO: int, VEHICLENUMBER: string ... 2 more fields]

scala> df.registerTempTable("vehicles")

scala> val sqlDF = spark.sql("SELECT ACCOUNTNO, collect_list(VEHICLE) as ACCOUNT_LIST FROM VEHICLES group by ACCOUNTNO").toJSON
sqlDF: org.apache.spark.sql.Dataset[String] = [value: string]

scala> sqlDF.show(false)
+-----------------------------------------------------------------------------------------------------------------------------------------------+
|value                                                                                                                                          |
+-----------------------------------------------------------------------------------------------------------------------------------------------+
|{"ACCOUNTNO":10003014,"ACCOUNT_LIST":[{"VEHICLENUMBER":"MH43AJ411","CUSTOMERID":20000000},{"VEHICLENUMBER":"MH43AJ411","CUSTOMERID":20000001}]}|
|{"ACCOUNTNO":10003015,"ACCOUNT_LIST":[{"VEHICLENUMBER":"MH12GZ3392","CUSTOMERID":20000002}]}                                                   |
+-----------------------------------------------------------------------------------------------------------------------------------------------+
...