Конвертировать MapType col в Json: DataFrame - PullRequest
0 голосов
/ 14 февраля 2019

У меня есть Dataframe с таким содержимым:

+---+-----------------------------------------------------------------------------------------------------------------------------------+
|id |mapData                                                                                                                            |
+---+-----------------------------------------------------------------------------------------------------------------------------------+
|1  |Map(e1 -> WrappedArray({"number":"n1","strData":"d1","intData":2}), e2 -> WrappedArray({"number":"n1","strData":"d1","intData":2}))|
+---+-----------------------------------------------------------------------------------------------------------------------------------+

Я хочу преобразовать то же самое в Json, как

+---+-----------------------------------------------------------------------------------------------------------------------------------+
|1  |{"e1": [{"number": "n1","strData": "d1","intData": 2}],"e2": [{"number": "n1","strData": "d1","intData": 2}]}                      |
+---+-----------------------------------------------------------------------------------------------------------------------------------+

Я пытался df.withColumn("jsonData", to_json(col("mapData"))), но получая AnalysisException при запускето же самое.

Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve 'structstojson(`mapData`)' due to data type mismatch: Input type map<string,array<string>> must be a struct or array of structs.;;
'Project [id#9, mapData#98, structstojson(mapData#98, Some(Asia/Kolkata)) AS jsonData#138]
+- Aggregate [id#9], [id#9, customaggregator2(executor_id#36, data#32, CustomAggregator2@3c38e2bf, 0, 0) AS mapData#98]
   +- Union
      :- Project [id#9, data#32, e1 AS executor_id#36]
      :  +- Aggregate [id#9], [id#9, collect_list(data#18, 0, 0) AS data#32]
      :     +- Project [id#9, number#10, strData#11, intData#12, structstojson(named_struct(number, number#10, strData, strData#11, intData, intData#12), Some(Asia/Kolkata)) AS data#18]
      :        +- Project [_1#4 AS id#9, _2#5 AS number#10, _3#6 AS strData#11, _4#7 AS intData#12]
      :           +- LocalRelation [_1#4, _2#5, _3#6, _4#7]
      +- Project [id#9, data#55, e2 AS executor_id#59]
         +- Aggregate [id#9], [id#9, collect_list(data#41, 0, 0) AS data#55]
            +- Project [id#9, number#10, strData#11, intData#12, structstojson(named_struct(number, number#10, strData, strData#11, intData, intData#12), Some(Asia/Kolkata)) AS data#41]
               +- Project [_1#4 AS id#9, _2#5 AS number#10, _3#6 AS strData#11, _4#7 AS intData#12]
                  +- LocalRelation [_1#4, _2#5, _3#6, _4#7]

Версия:

Scala: 2.11
Spark: 2.2
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...