val df = List(("id1", "id2"),
("id2", "id3"),
("id1", "id4"),
("id3", "id4")
).toDF("col1", "col2")
df.groupBy('col1).agg(array_join(collect_set('col2), ",").as("col2"))
.union(
df.groupBy('col2).agg(array_join(collect_set('col1), ",").as("col2"))
)
.groupBy('col1).agg(array_join(collect_set('col2), ",").as("col2")).show()
вывод:
+----+-------+
|col1| col2|
+----+-------+
| id3|id2,id4|
| id1|id2,id4|
| id2|id1,id3|
| id4|id1,id3|
+----+-------+
Надеюсь, это поможет ..