Разбор вложенного JSON с Python: TypeError: индексы списка должны быть целыми числами, а не str - PullRequest
2 голосов
/ 11 июля 2019

У меня есть вложенные данные, которые я хотел бы вставить в кадр данных Pandas из JSON, но мой JSON вложен и выдает ошибку

ниже приведены данные

{"data":[{"date":"2018-08-20T00:00:00","values":[{"account":"account_1","device":"device_1","deviceModel":"testdev","id":"id_1","Events":[{"EventCategory":"Scan","EventCategoryData":[{"name":"scanname","info":[{"type":"any","count":8.0}]},{"name":"scanname","info":[{"type":"any","count":1.0}]}],"scancount":2.0},{"EventCategory":"Web","EventCategoryData":[{"name":"web_Scan","info":[{"type":"Web","count":2.0}]},{"name":"web scan 2","info":[{"type":"Web 2","count":0.0}]},{"name":"web 3 ","info":[{"type":"Web 3","count":2.0}]}]},{"EventCategory":"WWW","EventCategoryData":[{"name":"any","info":[{"type":"wifi","count":2.0}]}],"scancount":4.0},{"EventCategory":"Others","EventCategoryData":[{"name":"anything","info":[{"previousversion":"default","updatedversion":"default"}]}]}]}]},{"date":"2018-08-22T00:00:00","values":[{"account":"account_1","device":"device_1","deviceModel":"testdev","id":"id_2","Events":[{"EventCategory":"Scan2","EventCategoryData":[{"name":"scan name","info":[{"type":"scan 2","count":2}]},{"name":"update","info":[{"type":"scan","count":1},{"type":"WWW","count":1}]}],"scancount":1},{"EventCategory":"Web","EventCategoryData":[{"name":"web1","info":[{"type":"WWW","count":1}]},{"name":"Wifi","info":[{"type":"Web Sites","count":1}]},{"name":"web2","info":[{"type":"scan","count":1}]}]}]}]}],"status":"success"}

Я пытался json_normalize

normalize_data = json_normalize(data['data'],['values'], record_path ='EventCategory' ,errors='ignore')

TypeError: json_normalize() got multiple values for argument 'record_path'

Я хочу построить фрейм данных со всем ключом в виде столбца и значениями в виде строк. Любая помощь здесь, пожалуйста

1 Ответ

0 голосов
/ 11 июля 2019

json_normalize () - Невозможно сделать это полностью универсальным способом, используя json_normalize(). Вы можете использовать аргументы record_path и meta, чтобы указать, как вы хотите обрабатывать JSON.

from pandas.io.json import json_normalize

data ={"data":[{"date":"2018-08-20T00:00:00","values":[{"account":"account_1","device":"device_1","deviceModel":"testdev","id":"id_1","Events":[{"EventCategory":"Scan","EventCategoryData":[{"name":"scanname","info":[{"type":"any","count":8.0}]},{"name":"scanname","info":[{"type":"any","count":1.0}]}],"scancount":2.0},{"EventCategory":"Web","EventCategoryData":[{"name":"web_Scan","info":[{"type":"Web","count":2.0}]},{"name":"web scan 2","info":[{"type":"Web 2","count":0.0}]},{"name":"web 3 ","info":[{"type":"Web 3","count":2.0}]}]},{"EventCategory":"WWW","EventCategoryData":[{"name":"any","info":[{"type":"wifi","count":2.0}]}],"scancount":4.0},{"EventCategory":"Others","EventCategoryData":[{"name":"anything","info":[{"previousversion":"default","updatedversion":"default"}]}]}]}]},{"date":"2018-08-22T00:00:00","values":[{"account":"account_1","device":"device_1","deviceModel":"testdev","id":"id_2","Events":[{"EventCategory":"Scan2","EventCategoryData":[{"name":"scan name","info":[{"type":"scan 2","count":2}]},{"name":"update","info":[{"type":"scan","count":1},{"type":"WWW","count":1}]}],"scancount":1},{"EventCategory":"Web","EventCategoryData":[{"name":"web1","info":[{"type":"WWW","count":1}]},{"name":"Wifi","info":[{"type":"Web Sites","count":1}]},{"name":"web2","info":[{"type":"scan","count":1}]}]}]}]}],"status":"success"}

#merge all data['data] multiple list of data['value'] into single list
flat_list = [item for sublist in data['data'] for item in sublist['values']]
result = json_normalize(flat_list, record_path=['Events','EventCategoryData','info'],\
                        meta=['account','device','deviceModel','id',['Events','EventCategory'],\
                              ['Events','EventCategory','name']])
print(result)

O / P:

    count previousversion       type updatedversion    account    device deviceModel    id Events.EventCategory Events.EventCategory.name
0     8.0             NaN        any            NaN  account_1  device_1     testdev  id_1                 Scan                  scanname
1     1.0             NaN        any            NaN  account_1  device_1     testdev  id_1                 Scan                  scanname
2     2.0             NaN        Web            NaN  account_1  device_1     testdev  id_1                  Web                  web_Scan
3     0.0             NaN      Web 2            NaN  account_1  device_1     testdev  id_1                  Web                web scan 2
4     2.0             NaN      Web 3            NaN  account_1  device_1     testdev  id_1                  Web                    web 3 
5     2.0             NaN       wifi            NaN  account_1  device_1     testdev  id_1                  WWW                       any
6     NaN         default        NaN        default  account_1  device_1     testdev  id_1               Others                  anything
7     2.0             NaN     scan 2            NaN  account_1  device_1     testdev  id_2                Scan2                 scan name
8     1.0             NaN       scan            NaN  account_1  device_1     testdev  id_2                Scan2                    update
9     1.0             NaN        WWW            NaN  account_1  device_1     testdev  id_2                Scan2                    update
10    1.0             NaN        WWW            NaN  account_1  device_1     testdev  id_2                  Web                      web1
11    1.0             NaN  Web Sites            NaN  account_1  device_1     testdev  id_2                  Web                      Wifi
12    1.0             NaN       scan            NaN  account_1  device_1     testdev  id_2                  Web                      web2

Обновление:

#merge all data['data] multiple list into single list and merge date items into values sublist of dict.
flat_list = []
for sublist in data['data']:
    new_list = [item for item in sublist['values']]
    new_list[0]['date'] = sublist['date']
    flat_list.extend(new_list)

result = json_normalize(flat_list, record_path=['Events','EventCategoryData','info'],\
                        meta=['account','device','deviceModel','id','date',['Events','EventCategory'],\
                              ['Events','EventCategory','name']])

print(result)
* * 1016 O / P:
    count previousversion       type updatedversion  ...    id                 date Events.EventCategory Events.EventCategory.name
0     8.0             NaN        any            NaN  ...  id_1  2018-08-20T00:00:00                 Scan                  scanname
1     1.0             NaN        any            NaN  ...  id_1  2018-08-20T00:00:00                 Scan                  scanname
2     2.0             NaN        Web            NaN  ...  id_1  2018-08-20T00:00:00                  Web                  web_Scan
3     0.0             NaN      Web 2            NaN  ...  id_1  2018-08-20T00:00:00                  Web                web scan 2
4     2.0             NaN      Web 3            NaN  ...  id_1  2018-08-20T00:00:00                  Web                    web 3 
5     2.0             NaN       wifi            NaN  ...  id_1  2018-08-20T00:00:00                  WWW                       any
6     NaN         default        NaN        default  ...  id_1  2018-08-20T00:00:00               Others                  anything
7     2.0             NaN     scan 2            NaN  ...  id_2  2018-08-22T00:00:00                Scan2                 scan name
8     1.0             NaN       scan            NaN  ...  id_2  2018-08-22T00:00:00                Scan2                    update
9     1.0             NaN        WWW            NaN  ...  id_2  2018-08-22T00:00:00                Scan2                    update
10    1.0             NaN        WWW            NaN  ...  id_2  2018-08-22T00:00:00                  Web                      web1
11    1.0             NaN  Web Sites            NaN  ...  id_2  2018-08-22T00:00:00                  Web                      Wifi
12    1.0             NaN       scan            NaN  ...  id_2  2018-08-22T00:00:00                  Web                      web2

[13 rows x 11 columns]
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...