Если в группе требуется first not null, это может быть достигнуто с помощью функции "first":
val df = Seq(
(50, Some(2), None, None),
(34, Some(4), None, None),
(34, None, Some(true), Some(60000.0)),
(32, None, Some(false), Some(35000.0))
).toDF("age", "children", "education", "income")
val result = df
.groupBy("age")
.agg(
first("children", ignoreNulls = true).alias("children"),
first("education", ignoreNulls = true).alias("education"),
first("income", ignoreNulls = true).alias("income")
)
result.orderBy("age").show(false)
Выход:
+---+--------+---------+-------+
|age|children|education|income |
+---+--------+---------+-------+
|32 |null |false |35000.0|
|34 |4 |true |60000.0|
|50 |2 |null |null |
+---+--------+---------+-------+