Я получил входной текст, подобный этому:
KPi.033 Name: 1: RI 2: WON HO 3: 4: na
Title: na Designation: DPRK Ministry of State Security Official DOB: 17 Jul. 1964 POB: na Good quality a.k.a.:
na Low quality a.k.a.: na Nationality: Democratic People's Republic of Korea Passport no: 381310014 National
identification no: na Address: na Listed on: 30 Nov. 2016 Other information:
KPi.037 Name: 1: CHANG 2: CHANG HA 3: 4: na
Title: na Designation: President of the Second Academy of Natural Sciences (SANS) DOB: 10 Jan. 1964 POB:
na Good quality a.k.a.: Jang Chang Ha Low quality a.k.a.: na Nationality: Democratic People's Republic of Korea
Passport no: na National identification no: na Address: na Listed on: 30 Nov. 2016 Other information:
KPi.038 Name: 1: CHO 2: CHUN RYONG 3: 4: na
Title: na Designation: Chairman of the Second Economic Committee (SEC) DOB: 4 Apr. 1960 POB: na Good
quality a.k.a.: Jo Chun Ryong Low quality a.k.a.: na Nationality: Democratic People's Republic of Korea
Passport no: na National identification no: na Address: na Listed on: 30 Nov. 2016 Other information:
Оттуда я хочу получить только полное имя (означает 1: 2: 3: 4 :), национальность и адрес.
что я делал до сих пор:
import re
from io import open
regex = r"((Name\:[^$\n]+) |(Address:[^\:$]+) |(Nationality:[^\:]+))"
#regex2 = r"(Address:[^\:$]+)"
archivo = open('lista1.22.txt','r', encoding='utf-8')
lineas = archivo.readlines()
archivo.close()
archivo2 = open('resultado3.txt','w+', encoding='utf-8') #i know is txt, not a json is just to check how its working scripting
matches=()
for linea in lineas:
matches = re.findall(regex, linea)
for match in matches:
#print(match)
archivo2.writelines(match)
-------------------------------
Но новый документ, который я получаю, записывает все в одной строке, я хочу создать словарь, который может иметь json для импорта в DDBB