Я пытаюсь создать массив структур с результатом записей Spark SQL.Есть ли способ вставить записи в массив структур записей SQL.
Пример: у меня есть следующие данные после выполнения искры SQL
<strong> ID NAME DEPT FROM_DT TO_DT EMAIL </strong>
-----------------------------------------------------------------------------
1234 Robert 101 02/01/2012 03/14/2014 1234@GOG.com
1234 Robert 102 03/15/2014 07/04/2015 1234@GOG.com
1234 Robert 103 07/05/2015 03/25/2019 1234@GOG.com
6754 Albert 102 03/01/2012 09/19/2015 6754@GOG.com
6754 Albert 101 09/20/2015 03/25/2019 6754@GOG.com
<p>
I am trying to format the above result set data in the following format in through pyspark2.
</p>
{1234, Robert, [{DEPT:101, FROM_DT:02/01/2012, TO_DT:03/14/2014},
{DEPT:102, FROM_DT:03/15/2014, TO_DT:07/04/2015},
{DEPT:103, FROM_DT:07/05/2015, TO_DT:03/25/2019}], 1234@GOG.com}
{6754, Albert, [{DEPT:102, FROM_DT:03/01/2012, TO_DT:09/19/2014},
{DEPT:101, FROM_DT:09/20/2015, TO_DT:03/25/2019}], 6754@GOG.com}
I tried with a Spark SQL that gets the flattened data. But, i am looking for above format to populate into Hive structure.
from pyspark import SparkContext
from pyspark.sql import SQLContext
from pyspark.sql.types SparkContext import Row
sc=spark.sparkContext
raw_dept_data =sc.textFile("Raw_DEPT_File/part-m-00000")
dept_rdd=raw_dept_data.map(lambda r:Row(ID=r[0],NAME=r[1],DEPT=r[2],FROM_DT=r[3],To_DT=r[4]))
dept_dataframe=spark.createDataFrame(dept_rdd)
dept_dataframe.createOrReplaceTempView("History_Dept")
email_data =sc.textFile("Raw_Email_File/part-m-00000")
email_rdd=raw_data.map(lambda r:Row(ID=r[0],NAME=r[1],EMAIL=r[2]))
email_dataframe=spark.createDataFrame(email_rdd)
dataframe.createOrReplaceTempView("History_Email")
spark.sql("SELECT DP.ID, EM.NAME, DP.DEPT, DP.FROM_DT, DP.TO_DT, EM.EMAIL FROM History_Dept as DP, History_Email as EM WHERE DP.ID = EM.ID")
КакМогу ли я преобразовать результат в указанный формат?