Этот код:
val itemsSchema = List(
StructField("itemname", StringType, nullable = false),
StructField("itemid", IntegerType, nullable = false),
StructField("coupons", BooleanType, nullable = false))
val purchasesSchema = List(
StructField("itemname", StringType, nullable = false),
StructField("purchases", IntegerType, nullable = false))
val items = Seq(Row("A", 1, true), Row("A", 2, false))
val purchases = Seq(Row("A", 10), Row("B", 10), Row("C", 10))
val itemsDF = spark.createDataFrame(
spark.sparkContext.parallelize(items),
StructType(itemsSchema)
)
val purchasesDF = spark.createDataFrame(
spark.sparkContext.parallelize(purchases),
StructType(purchasesSchema)
)
purchasesDF.join(itemsDF, Seq("itemname")).show(false)
дает:
+--------+---------+------+-------+
|itemname|purchases|itemid|coupons|
+--------+---------+------+-------+
|A |10 |1 |true |
|A |10 |2 |false |
+--------+---------+------+-------+
надеюсь, что это поможет