Используйте .groupBy
с функциями to_json (Spark-2.4+) + collect_list +struct
.
Example:
import org.apache.spark.sql.functions._
val df=Seq((1,"apple",100),(1,"banana",105),(2,"grapes",102),(2,"orange",101),(2,"apple",101)).toDF("id","fruit","buy_time")
df.groupBy("id").agg(to_json(collect_list(struct(col("fruit"),col("buy_time").alias("time")))).alias("buy_info")).show(10,false)
//+---+------------------------------------------------------------------------------------------+
//|id |buy_info |
//+---+------------------------------------------------------------------------------------------+
//|1 |[{"fruit":"apple","time":100},{"fruit":"banana","time":105}] |
//|2 |[{"fruit":"grapes","time":102},{"fruit":"orange","time":101},{"fruit":"apple","time":101}]|
//+---+------------------------------------------------------------------------------------------+