В основном вам нужно explode
arrays
вместе , используя arrays_zip
, который возвращает объединенный массив структур . Попробуй это. Я не проверял, но это должно работать.
from pyspark.sql import functions as F
df_json.select("id","type","name","ppu","topping","batters.*")\
.withColumn("zipped", F.explode(F.arrays_zip("batter","topping")))\
.select("id","type","name","ppu","zipped.*").show()
Вы также можете сделать это one by one
:
from pyspark.sql import functions as F
df1=df_json.select("id","type","name","ppu","topping","batters.*")\
.withColumn("batter", F.explode("batter"))\
.select("id","type","name","ppu","topping","batter")
df1.withColumn("topping", F.explode("topping")).select("id","type","name","ppu","topping.*","batter.*")