Я пытаюсь разбить этот столбец на несколько столбцов, но, похоже, существует проблема с типом данных, хотя я указал его как тип данных массива.
Вот как выглядит столбец:
Column_x
[[{"Key":"a","Value":"40000.0"},{"Key":"b","Value":"0.0"},{"Key":"c","Value":"0.0"},{"Key":"f","Value":"false"},{"Key":"e","Value":"ADB"},{"Key":"d","Value":"true"}]]
[[{"Key":"a","Value":"100000.0"},{"Key":"b","Value":"1.5"},{"Key":"c","Value":"1.5"},{"Key":"d","Value":"false"},{"Key":"e","Value":"Rev30"},{"Key":"f","Value":"true"},{"Key":"g","Value":"48600.0"},{"Key":"g","Value":"0.0"},{"Key":"h","Value":"0.0"}],[{"Key":"i","Value":"100000.0"},{"Key":"j","Value":"1.5"},{"Key":"k","Value":"1.5"},{"Key":"l","Value":"false"},{"Key":"m","Value":"Rev30"},{"Key":"n","Value":"true"},{"Key":"o","Value":"48600.0"},{"Key":"p","Value":"0.0"},{"Key":"q","Value":"0.0"}]]
Примерно так:
Key Value
a 10000
b 200000
.
.
.
.
a 100000.0
b 1.5
Пока это моя работа:
from pyspark.sql.types import *
schema = ArrayType(ArrayType(StructType([StructField("Key", StringType()),
StructField("Value", StringType())])))
kn_sx = kn_s\
.withColumn("Keys", F.explode((F.from_json("Column_x", schema))))\
.withColumn("Key", col("Keys.Key"))\
.withColumn("Values", F.explode((F.from_json("Column_x", schema))))\
.withColumn("Value", col("Values.Value"))\
.drop("Values")
Вот ошибка:
AnalysisException: u"cannot resolve 'jsontostructs(`Column_x`)' due to data type mismatch: argument 1 requires string type, however, '`Column_x`' is of array<array<struct<Key:string,Value:string>>> type
Очень признателен за помощь.