Вы можете использовать параметр ignoreNulls
в first
:
Пример :
val df = Seq((1, Some(2)), (1, None), (2, None), (2, Some(3))).toDF("id", "visit_id")
df.show
+---+--------+
| id|visit_id|
+---+--------+
| 1| 2|
| 1| null|
| 2| null|
| 2| 3|
+---+--------+
df.groupBy("id").agg(first("visit_id", ignoreNulls=true).as("visit_id")).show
+---+--------+
| id|visit_id|
+---+--------+
| 1| 2|
| 2| 3|
+---+--------+