Надеюсь, что следующий пример справки:
val df = sc.parallelize(Seq(("MS 1","X"), ("MS 2", "Y"), ("MS 3", "X"), ("MS 4", "E"), ("MS 3", "E"))).toDF("ms", "event")
df.show
val fileter1DF = df.filter(col("event") === "X")
val filter2DF = df.filter(col("event") === "E")
val result = fileter1DF.as("x").join(filter2DF.as("e"), List("ms"), "left_outer").where(col("e.event").isNull).select(col("x.ms"))
result.show
Результаты:
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.4.0
/_/
Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_131)
Type in expressions to have them evaluated.
Type :help for more information.
scala> val df = sc.parallelize(Seq(("MS 1","X"), ("MS 2", "Y"), ("MS 3", "X"), ("MS4 2", "E"), ("MS 3", "E"))).toDF("ms", "event")
df: org.apache.spark.sql.DataFrame = [ms: string, event: string]
scala> df.show
+-----+-----+
| ms|event|
+-----+-----+
| MS 1| X|
| MS 2| Y|
| MS 3| X|
| MS 4| E|
| MS 3| E|
+-----+-----+
scala> val fileter1DF = df.filter(col("event") === "X")
fileter1DF: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [ms: string, event: string]
scala> val filter2DF = df.filter(col("event") === "E")
filter2DF: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [ms: string, event: string]
scala> val result = fileter1DF.as("x").join(filter2DF.as("e"), List("ms"), "left_outer").where(col("e.event").isNull).select(col("x.ms"))
result: org.apache.spark.sql.DataFrame = [ms: string]
scala> result.show
+----+
| ms|
+----+
|MS 1|
+----+
scala>