Конвертировать текст в JSON - PullRequest
0 голосов
/ 22 января 2020

Я получил входной текст, подобный этому:

KPi.033 Name:  1: RI  2: WON HO 3:  4: na
Title: na  Designation: DPRK Ministry of State Security Official   DOB: 17 Jul. 1964 POB: na Good quality a.k.a.:
na Low quality a.k.a.: na Nationality: Democratic People's Republic of Korea Passport no:  381310014 National
identification no: na Address:  na Listed on: 30 Nov. 2016  Other information: 

KPi.037 Name:  1: CHANG 2: CHANG HA 3:  4: na
Title: na  Designation: President of the Second Academy of Natural Sciences (SANS)   DOB: 10 Jan. 1964 POB:
na Good quality a.k.a.: Jang Chang Ha Low quality a.k.a.: na Nationality: Democratic People's Republic of Korea
Passport no: na National identification no: na Address:  na Listed on: 30 Nov. 2016  Other information:

KPi.038 Name:  1: CHO  2: CHUN RYONG  3:  4: na
Title: na  Designation: Chairman of the Second Economic Committee (SEC)   DOB: 4 Apr. 1960 POB: na Good
quality a.k.a.: Jo Chun Ryong  Low quality a.k.a.: na Nationality: Democratic People's Republic of Korea 
Passport no: na National identification no: na Address:  na Listed on: 30 Nov. 2016  Other information:

Оттуда я хочу получить только полное имя (означает 1: 2: 3: 4 :), национальность и адрес.

что я делал до сих пор:

import re
from io import open

regex = r"((Name\:[^$\n]+) |(Address:[^\:$]+) |(Nationality:[^\:]+))"
#regex2 = r"(Address:[^\:$]+)"
archivo = open('lista1.22.txt','r', encoding='utf-8')
lineas = archivo.readlines()
archivo.close()
archivo2 = open('resultado3.txt','w+', encoding='utf-8') #i know is txt, not a json is just to check how its working scripting
matches=()


for linea in lineas:
  matches = re.findall(regex, linea) 

  for match in matches:

    #print(match)

    archivo2.writelines(match)

-------------------------------

Но новый документ, который я получаю, записывает все в одной строке, я хочу создать словарь, который может иметь json для импорта в DDBB

Ответы [ 2 ]

1 голос
/ 22 января 2020
import json

with open(<file_name>) as file_handler:
    data = json.load(file_handler)
0 голосов
/ 22 января 2020

Перебор грубой силы

txt = """
KPi.033 Name:  1: RI  2: WON HO 3:  4: na
Title: na  Designation: DPRK Ministry of State Security Official   DOB: 17 Jul. 1964 POB: na Good quality a.k.a.:
na Low quality a.k.a.: na Nationality: Democratic People's Republic of Korea Passport no:  381310014 National
identification no: na Address:  na Listed on: 30 Nov. 2016  Other information: 

KPi.037 Name:  1: CHANG 2: CHANG HA 3:  4: na
Title: na  Designation: President of the Second Academy of Natural Sciences (SANS)   DOB: 10 Jan. 1964 POB:
na Good quality a.k.a.: Jang Chang Ha Low quality a.k.a.: na Nationality: Democratic People's Republic of Korea
Passport no: na National identification no: na Address:  na Listed on: 30 Nov. 2016  Other information:

KPi.038 Name:  1: CHO  2: CHUN RYONG  3:  4: na
Title: na  Designation: Chairman of the Second Economic Committee (SEC)   DOB: 4 Apr. 1960 POB: na Good
quality a.k.a.: Jo Chun Ryong  Low quality a.k.a.: na Nationality: Democratic People's Republic of Korea 
Passport no: na National identification no: na Address:  na Listed on: 30 Nov. 2016  Other information:
"""

l = re.split("\s(\w+:)\s", txt)

users = []
get_user = lambda: { 'Name': [], 'Address':'', 'Nationality': ''}

user = get_user()
for i, x in enumerate(l):
    if x in ["1:", "2:", "3:", "4:"]:
        if len(l[i+1]) > 1 and 'na' != l[i+1]:
            user['Name'].append(l[i+1].strip())
    elif x == "Nationality:":
        user['Nationality'] = l[i+1].strip()

    elif x == "Address:":
        user['Address'] = l[i+1].strip()
        users.append(user)
        user['Name'] = ' '.join(user['Name'])
        user = get_user()

json.dumps(users)

Выход

'[{"Name": "RI WON HO", "Address": "na Listed", "Nationality": "Democratic People\'s Republic of Korea Passport"}, {"Name": "CHANG CHANG HA", "Address": "na Listed", "Nationality": "Democratic People\'s Republic of Korea\\nPassport"}, {"Name": "CHO CHUN RYONG", "Address": "na Listed", "Nationality": "Democratic People\'s Republic of Korea \\nPassport"}]'
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...