У меня есть данные в json, которые выглядят так:
[{"state": "Florida",
"shortname": "FL",
"info": {"governor": "Rick Scott"},
"counties": [{"name": "Dade",
"population": 12345,
"Attributes": [
{
"capture_date": "2020-01-29",
"Spirit_code": "TRLQR",
"value": 1
},
{
"capture_date": "2020-01-29",
"Spirit_code": "HAVPN",
"value": 57000
}
]},
{"name": "Broward",
"population": 40000,
"Attributes": [
{
"capture_date": "2020-01-29",
"Spirit_code": "GMSTP",
"value": 14
},
{
"capture_date": "2020-01-29",
"Spirit_code": "GWTPN",
"value": 11212
}
]
},
{"name": "Palm Beach",
"population": 60000,
"Attributes": [{
"capture_date": "2020-01-29",
"Spirit_code": "YGHMN",
"value": 154.01
},
{
"capture_date": "2020-01-29",
"Spirit_code": "CXZASD",
"value": 154.01
}]
}
]},
{"state": "Ohio",
"shortname": "OH",
"info": {"governor": "John Kasich"},
"counties": [{"name": "Summit", "population": 1234,
"Attributes": [{
"capture_date": "2020-01-29",
"Spirit_code": "QWERTY",
"value": 154.01
},
{
"capture_date": "2020-01-29",
"Spirit_code": "JKLGH",
"value": 154.01
}]
},
{"name": "Cuyahoga", "population": 1337,
"Attributes": [{
"capture_date": "2020-01-29",
"Spirit_code": "ASDF",
"value": 154.01
},
{
"capture_date": "2020-01-29",
"Spirit_code": "POIUY",
"value": 154.01
}]
}],
}
]
Я получаю результат: using:
json_normalize(data["data"], ["counties", "Attributes"], ["state", "shortname", ["counties", "name"], ["counties", "population"]])
Как мы можем достичь результата json_normalize pandas с использованием Pyspark?
Желаемый результат должен быть в нормализованной форме, то же самое можно сделать с помощью pandas, но я не знаю, как мы можем достичь того же результата с помощью pyspark?
state, shortname, name, population, attirbute.capture_date, attirbute.spirit_code, attirbute.value
florida, FL ,Dade, 12345 , 2020-0-29 , TRLQR , 1
florida, FL ,Dade, 12345 , 2020-0-29 , HAVPN , 57000
florida, FL ,Broward, 40000 , 2020-0-29 , GMSTP , 14
florida, FL ,Broward, 40000 , 2020-0-29 , GWTPN , 11212
florida, FL ,Palm Beach, 60000 , 2020-0-29 , YGHMN , 154.01
florida, FL ,Palm Beach, 60000 , 2020-0-29 , YGHMN , 154.01
florida, FL ,Palm Beach, 60000 , 2020-0-29 , CXZASD , 154.01