Я не уверен, почему первое ("traitvalue") в запросе кадра выходных данных работает ниже. Что означает первое ("traitvalue") здесь?Пожалуйста, порекомендуйте.
кадр входных данных:
val df = sc.parallelize(List(("1","NA","action","Heavy", "NY"),("1","NA","comedy","light", "NY"),("1","NA","horror","light", "NY"),("1","NA","horror","light", "KY"),("2","NA","horror","light", "NY"))).toDF("ban","yr_mon","genre","traitvalue","state")
+---+------+------+----------+-----+
|ban|yr_mon| genre|traitvalue|state|
+---+------+------+----------+-----+
| 1| NA|action| Heavy| NY|
| 1| NA|comedy| light| NY|
| 1| NA|horror| light| NY|
| 1| NA|horror| light| KY|
| 2| NA|horror| light| NY|
+---+------+------+----------+-----+
кадр выходных данных
df.groupBy($"ban",$"state").pivot("genre").agg(first("traitvalue")).show
+---+-----+------+------+------+
|ban|state|action|comedy|horror|
+---+-----+------+------+------+
| 2| NY| null| null| light|
| 1| NY| Heavy| light| light|
| 1| KY| null| null| light|
+---+-----+------+------+------+