Вот решение с Spark, очень простое!
val originDf: DataFrame = Seq(
("A", "2015-01", "CAT", "30", "888.8", "1"),
("A", "2015-04", "CAT", "10", "14.3", "0.99"),
("A", "2015-11", "DOG", "6", "22.22", "0.65"),
("B", "2016-09", "BIRD", "1", "0.1", "0.11"))
.toDF("key", "date", "column", "number", "cost", "ratio")
.withColumn("column", lower(col("column")))
.withColumn("number", col("number").cast("double"))
.withColumn("cost", col("cost").cast("double"))
.withColumn("ratio", col("ratio").cast("double"))
val expectedDf: DataFrame = Seq(
("A", "2015-01", null, null, null, "30", "888.8", "1", null, null, null),
("A", "2015-04", null, null, null, "10", "14.3", "0.99", null, null, null),
("A", "2015-11", null, null, null, null, null, null, "6", "22.22", "0.65"),
("B", "2016-09", "1", "0.1", "0.11", null, null, null, null, null, null))
.toDF("key", "date", "bird_number", "bird_cost", "bird_ratio", "cat_number","cat_cost", "cat_ratio", "dog_number", "dog_cost", "dog_ratio")
.orderBy("key","date")
И реализация: я рекомендую использовать тестовый класс и работать с FlatSpect, чтобы вы могли протестировать свою реальную функцию позже.
val resultDf = originDf
.groupBy("key","date")
.pivot("column").max("number", "cost", "ratio")
.orderBy("key","date")
Будьте осторожны с функцией max, я использовал ее, потому что она может решить мои спецификации.