Извлечение текста между каждым появлением двух подстрок в текстовом файле - PullRequest
2 голосов
/ 01 июля 2019

Ниже приведен пример текста из файла журнала.Мне нужно извлечь весь текст между каждым появлением «Событие загружено» и следующим «}».Я также добавил пример того, что мне нужно вернуть (обратите внимание, это всего лишь пример - я буду применять метод в более общем случае).Кроме того, форматирование моего вывода не очень хорошее, просто идея.Быть близким к этому - хорошо, я могу отформатировать оттуда, содержание наиболее важно:

Ввод:

2019-06-28 15:02:09:918 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: [abc-rrre:365] Activate assertion: <BKProcessAssertion: 0x11ff1e710; "Shared Background Assertion 737 for el.defg.na.abcrrre2" (finishTask:180s); id:\M-b\M^@\M-&988DD2F10162>
2019-06-28 15:02:09:920 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: [abc-rrre:365] New process assertion state; preventSuspend, preventThrottleDownUI, preventThrottleDownCPU, preventIdleSleep, preventSuspendOnSleep (assertion 0x11ff1e710 added: preventIdleSleep; removed: (none))
2019-06-28 15:02:09:921 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: [abc-rrre:365] Setting jetsam priority to 10 [0x10108]
2019-06-28 15:02:09:921 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: Creating PowerAssertion on abc-rrre:365
2019-06-28 15:02:09:922 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device--Target powerd[35] <Notice>: Sleep revert state: 1
2019-06-28 15:02:09:926 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device--Target powerd[35] <Notice>: Process assertiond.62 Created SystemIsActive "abc-rrre:365:365-6E62D75B-8078-47DE-9B22-988DD2F10162 [Shared Background Assertion 737 for el.defg.na.abcrrre2] [0x11ff1e710]" age:00:00:00  id:51539643064 [System: SysAct]
2019-06-28 15:02:09:926 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: Created PowerAssertion on abc-rrre:365, sleep reverted
2019-06-28 15:02:09:926 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: Client relinquished <BKProcessAssertion: 0x11ff1e710; "Shared Background Assertion 737 for el.defg.na.abcrrre2" (finishTask:180s); id:\M-b\M^@\M-&988DD2F10162>
2019-06-28 15:02:09:927 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: [abc-rrre:365] Deactivate assertion: <BKProcessAssertion: 0x11ff1e710; "Shared Background Assertion 737 for el.defg.na.abcrrre2" (finishTask:180s); id:\M-b\M^@\M-&988DD2F10162>
2019-06-28 15:02:09:928 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: [abc-rrre:365] New process assertion state; preventSuspend, preventThrottleDownUI, preventThrottleDownCPU, preventSuspendOnSleep (assertion 0x11ff1e710 added: (none); removed: preventIdleSleep)
2019-06-28 15:02:09:929 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: [abc-rrre:365] Setting jetsam priority to 10 [0x10100]
2019-06-28 15:02:09:929 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: Releasing PowerAssertion on abc-rrre:365 from update
2019-06-28 15:02:09:930 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: [abc-rrre:365] Remove assertion: <BKProcessAssertion: 0x11ff1e710; "Shared Background Assertion 737 for el.defg.na.abcrrre2" (finishTask:180s); id:\M-b\M^@\M-&988DD2F10162>
2019-06-28 15:02:09:931 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device--Target powerd[35] <Notice>: Process assertiond.62 Released SystemIsActive "abc-rrre:365:365-6E62D75B-8078-47DE-9B22-988DD2F10162 [Shared Background Assertion 737 for el.defg.na.abcrrre2] [0x11ff1e710]" age:00:00:00  id:51539643064 [System: SysAct]
2019-06-28 15:02:09:932 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: -[BKAssertion dealloc] - <0x11ff1e710>
2019-06-28 15:02:09:933 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device—Target ABC-DEF[365] Notice: -[sendAllDataToServerWithDebug:] [Line 255] Event uploaded, ABCAccount : {
2019-06-28 15:02:09:933 - info: [bUSLog] [bUS_SYSLOG_ROW]     dcis = 0;
2019-06-28 15:02:09:933 - info: [bUSLog] [bUS_SYSLOG_ROW]     ttl = 0;
2019-06-28 15:02:09:933 - info: [bUSLog] [bUS_SYSLOG_ROW]     bb = 0;
2019-06-28 15:02:09:933 - info: [bUSLog] [bUS_SYSLOG_ROW]     r1 = 1234567890;
2019-06-28 15:02:09:934 - info: [bUSLog] [bUS_SYSLOG_ROW]     pop = abc;
2019-06-28 15:02:09:934 - info: [bUSLog] [bUS_SYSLOG_ROW]     origin = target;
2019-06-28 15:02:09:934 - info: [bUSLog] [bUS_SYSLOG_ROW]     "tsn" = “l323f123f”;
2019-06-28 15:02:09:934 - info: [bUSLog] [bUS_SYSLOG_ROW] }
2019-06-28 15:02:09:934 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device—Target ABC-DEF[365] Notice: -[sendAllDataToServerWithDebug:] [Line 255] Event uploaded, ABCAccount : {
2019-06-28 15:02:09:935 - info: [bUSLog] [bUS_SYSLOG_ROW]     dcis = 0;
2019-06-28 15:02:09:935 - info: [bUSLog] [bUS_SYSLOG_ROW]     ttl = 0;
2019-06-28 15:02:09:935 - info: [bUSLog] [bUS_SYSLOG_ROW]     bb = 0;
2019-06-28 15:02:09:935 - info: [bUSLog] [bUS_SYSLOG_ROW]     r1 = NA;
2019-06-28 15:02:09:935 - info: [bUSLog] [bUS_SYSLOG_ROW]     pop = abc;
2019-06-28 15:02:09:936 - info: [bUSLog] [bUS_SYSLOG_ROW]     origin = source;
2019-06-28 15:02:09:936 - info: [bUSLog] [bUS_SYSLOG_ROW]     "tsn" = “lasdf23f23”;
2019-06-28 15:02:09:936 - info: [bUSLog] [bUS_SYSLOG_ROW] }
2019-06-28 15:02:09:936 - info: [bUSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device—Target ABC-DEF[365] Notice: -[sendAllDataToServerWithDebug:] [Line 255] Event uploaded, ABCAdditional : {
2019-06-28 15:02:09:937 - info: [bUSLog] [IOS_SYSLOG_ROW]     add1 = value;
2019-06-28 15:02:09:937 - info: [bUSLog] [IOS_SYSLOG_ROW]     add2 = false;
2019-06-28 15:02:09:937 - info: [bUSLog] [IOS_SYSLOG_ROW]     pop = abc;
2019-06-28 15:02:09:937 - info: [bUSLog] [IOS_SYSLOG_ROW]     origin = target;
2019-06-28 15:02:09:937 - info: [bUSLog] [IOS_SYSLOG_ROW]     “tsn” = “g254g34gg4g”;
2019-06-28 15:02:09:937 - info: [bUSLog] [IOS_SYSLOG_ROW]     "time_zone" = EDT;
2019-06-28 15:02:09:938 - info: [bUSLog] [IOS_SYSLOG_ROW] }

Вывод:

ABCAccount : { dcis = 0; ttl = 0; bb = 0; r1 = 1234567890; pop = abc; origin = target; "tsn" = “l323f123f”;}
ABCAccount : { dcis = 0; ttl = 0; bb = 0; r1 = NA; pop = abc; origin = source; "tsn" = “lasdf23f23”;}
ABCAdditional : { add1 = value; add2 = false; pop = abc; origin = target’;  “tsn” = “g254g34gg4g”; "time_zone" = EDT;}"

Я пытался использовать:

   start = ‘Event uploaded, ’
   end = ‘}’
   new = entry[entry.find(start)+len(start):entry.rfind(end)]

и несколько других методов (включая регулярные выражения), но не повезло ... Любая помощь будет оценена, спасибо!

Редактировать (попытка):

with open(target_logs) as log:
do_print = False
event_key = 'Event uploaded,'

for line in log:
    line = line.strip()
    if do_print:
        sys.stdout.write(line[line.rfind(']') + 1:].strip())
    if event_key in line:
        do_print = True
        sys.stdout.write(line[line.find(event_key) + len(event_key):].strip())
    elif line.endswith('}'):
        do_print = False
        print()

Получение ответа:

2019-06-28 15:02:11:672 - info: [bUSLog] [BUS_SYSLOG_ROW] Jun 28 11:02:11 device--Target sharingd(WirelessProximity)[57] <Notice>: Nearby start scanning with data: scan request of type 16, blob: <>, mask <>, active: 0, duplicates: 0, screen on: 300, screen off: 300, rssi: -60, peers: (
2019-06-28 15:02:11:672 - info: [bUSLog] [BUS_SYSLOG_ROW]     "1A02F1A8-5597-4B1F-8802-BA022F789F81",
2019-06-28 15:02:11:673 - info: [bUSLog] [BUS_SYSLOG_ROW]     "A80A3D54-F8F2-D96B-598B-3EF0AE3ABC70",
2019-06-28 15:02:11:673 - info: [bUSLog] [BUS_SYSLOG_ROW]     "B4F0AC04-4A06-92EB-AA85-32002E6675BC",
2019-06-28 15:02:11:674 - info: [bUSLog] [BUS_SYSLOG_ROW]     "D9A5686A-C971-ADEB-A33F-2C772F351D45",
2019-06-28 15:02:11:674 - info: [bUSLog] [BUS_SYSLOG_ROW]     "5B66FA21-AA48-66D8-A619-1C0EA9190597",
2019-06-28 15:02:11:674 - info: [bUSLog] [BUS_SYSLOG_ROW]     "C540AC68-57DF-DA13-3C73-1129E2DD5A6D",
2019-06-28 15:02:11:674 - info: [bUSLog] [BUS_SYSLOG_ROW]     "CCD3C7C8-5069-C9C7-D4B5-FCAD9C2FA15F",
2019-06-28 15:02:11:675 - info: [bUSLog] [BUS_SYSLOG_ROW]     "E6A1699E-91BC-AEB1-DE99-C7C0FB440FAA",
2019-06-28 15:02:11:675 - info: [bUSLog] [BUS_SYSLOG_ROW]     "01480FF0-CD8D-C505-524D-CC139711A730"

Ответы [ 2 ]

2 голосов
/ 01 июля 2019

В качестве первого шага мы выполняем подстановку ( regex101 ), затем просто разделяем после }\n и удаляем новые строки:

data = '''2019-06-28 15:02:09:918 - info: [iOSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: [abc-rrre:365] Activate assertion: <BKProcessAssertion: 0x11ff1e710; "Shared Background Assertion 737 for el.defg.na.abcrrre2" (finishTask:180s); id:\M-b\M^@\M-&988DD2F10162>
2019-06-28 15:02:09:920 - info: [iOSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: [abc-rrre:365] New process assertion state; preventSuspend, preventThrottleDownUI, preventThrottleDownCPU, preventIdleSleep, preventSuspendOnSleep (assertion 0x11ff1e710 added: preventIdleSleep; removed: (none))
2019-06-28 15:02:09:921 - info: [iOSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: [abc-rrre:365] Setting jetsam priority to 10 [0x10108]
2019-06-28 15:02:09:921 - info: [iOSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: Creating PowerAssertion on abc-rrre:365
2019-06-28 15:02:09:922 - info: [iOSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device--Target powerd[35] <Notice>: Sleep revert state: 1
2019-06-28 15:02:09:926 - info: [iOSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device--Target powerd[35] <Notice>: Process assertiond.62 Created SystemIsActive "abc-rrre:365:365-6E62D75B-8078-47DE-9B22-988DD2F10162 [Shared Background Assertion 737 for el.defg.na.abcrrre2] [0x11ff1e710]" age:00:00:00  id:51539643064 [System: SysAct]
2019-06-28 15:02:09:926 - info: [iOSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: Created PowerAssertion on abc-rrre:365, sleep reverted
2019-06-28 15:02:09:926 - info: [iOSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: Client relinquished <BKProcessAssertion: 0x11ff1e710; "Shared Background Assertion 737 for el.defg.na.abcrrre2" (finishTask:180s); id:\M-b\M^@\M-&988DD2F10162>
2019-06-28 15:02:09:927 - info: [iOSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: [abc-rrre:365] Deactivate assertion: <BKProcessAssertion: 0x11ff1e710; "Shared Background Assertion 737 for el.defg.na.abcrrre2" (finishTask:180s); id:\M-b\M^@\M-&988DD2F10162>
2019-06-28 15:02:09:928 - info: [iOSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: [abc-rrre:365] New process assertion state; preventSuspend, preventThrottleDownUI, preventThrottleDownCPU, preventSuspendOnSleep (assertion 0x11ff1e710 added: (none); removed: preventIdleSleep)
2019-06-28 15:02:09:929 - info: [iOSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: [abc-rrre:365] Setting jetsam priority to 10 [0x10100]
2019-06-28 15:02:09:929 - info: [iOSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: Releasing PowerAssertion on abc-rrre:365 from update
2019-06-28 15:02:09:930 - info: [iOSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: [abc-rrre:365] Remove assertion: <BKProcessAssertion: 0x11ff1e710; "Shared Background Assertion 737 for el.defg.na.abcrrre2" (finishTask:180s); id:\M-b\M^@\M-&988DD2F10162>
2019-06-28 15:02:09:931 - info: [iOSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device--Target powerd[35] <Notice>: Process assertiond.62 Released SystemIsActive "abc-rrre:365:365-6E62D75B-8078-47DE-9B22-988DD2F10162 [Shared Background Assertion 737 for el.defg.na.abcrrre2] [0x11ff1e710]" age:00:00:00  id:51539643064 [System: SysAct]
2019-06-28 15:02:09:932 - info: [iOSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: -[BKAssertion dealloc] - <0x11ff1e710>
2019-06-28 15:02:09:933 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device—Target ABC-DEF[365] Notice: -[sendAllDataToServerWithDebug:] [Line 255] Event uploaded, ABCAccount : {
2019-06-28 15:02:09:933 - info: [bUSLog] [bUS_SYSLOG_ROW]     dcis = 0;
2019-06-28 15:02:09:933 - info: [bUSLog] [bUS_SYSLOG_ROW]     ttl = 0;
2019-06-28 15:02:09:933 - info: [bUSLog] [bUS_SYSLOG_ROW]     bb = 0;
2019-06-28 15:02:09:933 - info: [bUSLog] [bUS_SYSLOG_ROW]     r1 = 1234567890;
2019-06-28 15:02:09:934 - info: [bUSLog] [bUS_SYSLOG_ROW]     pop = abc;
2019-06-28 15:02:09:934 - info: [bUSLog] [bUS_SYSLOG_ROW]     origin = target;
2019-06-28 15:02:09:934 - info: [bUSLog] [bUS_SYSLOG_ROW]     "tsn" = “l323f123f”;
2019-06-28 15:02:09:934 - info: [bUSLog] [bUS_SYSLOG_ROW] }
2019-06-28 15:02:09:934 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device—Target ABC-DEF[365] Notice: -[sendAllDataToServerWithDebug:] [Line 255] Event uploaded, ABCAccount : {
2019-06-28 15:02:09:935 - info: [bUSLog] [bUS_SYSLOG_ROW]     dcis = 0;
2019-06-28 15:02:09:935 - info: [bUSLog] [bUS_SYSLOG_ROW]     ttl = 0;
2019-06-28 15:02:09:935 - info: [bUSLog] [bUS_SYSLOG_ROW]     bb = 0;
2019-06-28 15:02:09:935 - info: [bUSLog] [bUS_SYSLOG_ROW]     r1 = NA;
2019-06-28 15:02:09:935 - info: [bUSLog] [bUS_SYSLOG_ROW]     pop = abc;
2019-06-28 15:02:09:936 - info: [bUSLog] [bUS_SYSLOG_ROW]     origin = source;
2019-06-28 15:02:09:936 - info: [bUSLog] [bUS_SYSLOG_ROW]     "tsn" = “lasdf23f23”;
2019-06-28 15:02:09:936 - info: [bUSLog] [bUS_SYSLOG_ROW] }
2019-06-28 15:02:09:936 - info: [bUSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device—Target ABC-DEF[365] Notice: -[sendAllDataToServerWithDebug:] [Line 255] Event uploaded, ABCAdditional : {
2019-06-28 15:02:09:937 - info: [bUSLog] [IOS_SYSLOG_ROW]     add1 = value;
2019-06-28 15:02:09:937 - info: [bUSLog] [IOS_SYSLOG_ROW]     add2 = false;
2019-06-28 15:02:09:937 - info: [bUSLog] [IOS_SYSLOG_ROW]     pop = abc;
2019-06-28 15:02:09:937 - info: [bUSLog] [IOS_SYSLOG_ROW]     origin = target;
2019-06-28 15:02:09:937 - info: [bUSLog] [IOS_SYSLOG_ROW]     “tsn” = “g254g34gg4g”;
2019-06-28 15:02:09:937 - info: [bUSLog] [IOS_SYSLOG_ROW]     "time_zone" = EDT;
2019-06-28 15:02:09:938 - info: [bUSLog] [IOS_SYSLOG_ROW] }'''

import re

data = re.sub(r'^.*SYSLOG_ROW\]\s*(?:[A-Z].+?(?=Event uploaded,|$))?', r'', data, flags=re.M)
data = re.sub(r'^"[^"]+",?$', r'', data, flags=re.M)
for row in [v.replace('\n', '').lstrip('Event uploaded,') for v in re.split(r'(?<=})\n', data)]:
    print(row)

Печать:

ABCAccount : {dcis = 0;ttl = 0;bb = 0;r1 = 1234567890;pop = abc;origin = target;"tsn" = “l323f123f”;}
ABCAccount : {dcis = 0;ttl = 0;bb = 0;r1 = NA;pop = abc;origin = source;"tsn" = “lasdf23f23”;}
ABCAdditional : {add1 = value;add2 = false;pop = abc;origin = target;“tsn” = “g254g34gg4g”;"time_zone" = EDT;}

РЕДАКТИРОВАТЬ (для чтения из файла):

import re

with open('log.txt', 'r') as f_in:
    data = f_in.read()

data = re.sub(r'^.*SYSLOG_ROW\]\s*(?:[A-Z].+?(?=Event uploaded,|$))?', r'', data, flags=re.M)
data = re.sub(r'^"[^"]+",?$', r'', data, flags=re.M)
for row in [v.replace('\n', '').lstrip('Event uploaded,') for v in re.split(r'(?<=})\n', data)]:
    print(row)
1 голос
/ 01 июля 2019

Итеративный подход (для python 3.x ):

with open('log.txt') as log:
    do_print = False
    event_key = 'Event uploaded,'  # starting marker

    for line in log:
        line = line.strip()
        if do_print: print(line[line.rfind(']') + 1:].strip(), end=' ')
        if event_key in line:
            do_print = True
            print(line[line.find(event_key) + len(event_key):].strip(), end=' ')
        elif line.endswith('}'):
            do_print = False
            print()

Выход:

ABCAccount : { dcis = 0; ttl = 0; bb = 0; r1 = 1234567890; pop = abc; origin = target; "tsn" = “l323f123f”; } 
ABCAccount : { dcis = 0; ttl = 0; bb = 0; r1 = NA; pop = abc; origin = source; "tsn" = “lasdf23f23”; } 
ABCAdditional : { add1 = value; add2 = false; pop = abc; origin = target; “tsn” = “g254g34gg4g”; "time_zone" = EDT; } 

Для более низких версий python может использоваться sys.stdout.write подход вместо print(..., end=' ').

Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...