Самый простой способ - использовать left_anti
соединение:
val df1 = Seq(
(1, "a"), (1, "b"), (1, "c"),
(2, "c"), (2, "d"), (2, "e"), (2, "f")
).toDF("id", "trans_id")
val df2 = Seq(
(1, "a", 0.3), (1, "b", 0.4), (1, "c", 0.5), (1, "d", 0.1), (1, "e", 0.2), (1, "f", 0.5),
(2, "a", 0.1), (2, "b", 0.5), (2, "c", 0.6), (2, "d", 0.8), (2, "e", 0.9), (2, "f", 0.2)
).toDF("id", "trans_id", "score")
df2.join(df1, Seq("id", "trans_id"), "left_anti").show
// +---+--------+-----+
// | id|trans_id|score|
// +---+--------+-----+
// | 1| d| 0.1|
// | 1| e| 0.2|
// | 1| f| 0.5|
// | 2| a| 0.1|
// | 2| b| 0.5|
// +---+--------+-----+