Вот как вы можете сделать это в искровом скале.
val conf = spark.sparkContext.hadoopConfiguration
val test = spark.sparkContext.parallelize(List( ("1", "Complete", "yes"),
("1", "Complete", "yes"),
("2", "Inprogress", "no"),
("2", "Inprogress", "no"),
("3", "Not yet started", "initiate"),
("3", "Not yet started", "initiate"))
).toDF("ColumnA","ColumnB","ColumnC")
test.show()
val test_pivot = test.groupBy("ColumnA")
.pivot("ColumnB")
.agg(count("columnC"))
test_pivot.na.fill(0)show(false)
}
и вывод
|ColumnA|Complete|Inprogress|Not yet started|
+-------+--------+----------+---------------+
|3 |0 |0 |2 |
|1 |2 |0 |0 |
|2 |0 |2 |0 |
+-------+--------+----------+---------------+