val df1 = Seq(("order_1", "order_1_info"),
("order_2", "order_2_info")).toDF("order_id", "info")
val df2 = Seq(("order_1", "room_1", 100, "palace_1"),
("order_2", "room_2", 200, "palace_2"),
("order_1", "room_3", 100, "palace_3"),
("order_2", "room_8", 200, "palace_x"))
.toDF("order_id", "room_id", "room_price", "room_name")
val cols: Array[String] = df2.columns
val df3 = df2.groupBy("order_id").agg(collect_list(struct(cols.head, cols.tail:_*)) as "room")
val df4 = df1.join(df3, Seq("order_id"))
df4.show()
df4.printSchema()
В приведенном выше фрагменте кода я только что сделал несколько примеров данных для использования.
Выход: -
+--------+------------+--------------------+
|order_id| info| room|
+--------+------------+--------------------+
| order_1|order_1_info|[[order_1,room_1,...|
| order_2|order_2_info|[[order_2,room_2,...|
+--------+------------+--------------------+
Схема: -
root
|-- order_id: string (nullable = true)
|-- info: string (nullable = true)
|-- room: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- order_id: string (nullable = true)
| | |-- room_id: string (nullable = true)
| | |-- room_price: integer (nullable = false)
| | |-- room_name: string (nullable = true)
Надеюсь, это полезно