Код для очистки этих данных mess использует частично регулярное выражение, частично интерполяцию строк.
Для записи очищенного csv используется модуль csv из-за необходимости маскировать внутренний ,
в тексте (например, в строке с Old:111, New:222, ...
):
Createдемонстрационный файл:
with open("data.txt","w") as w:
w.write("""01-01-1998 00:00:00 AM GP: D(B): 1234 to time difference. Hourly Avg:-3 secs
01-01-1998 00:00:12 AM GP: D(A): 2345 to time difference. Hourly Avg:0 secs
01-01-1998 00:08:08 AM SYS: The Screen Is now minimised.
01-01-1998 00:09:10 AM 00:09:10 AM SC: Findcorrect: W. D:1. Count one two three four five. #there are somehow some glitch in the system showing 2 timestamp
01-01-1998 00:14:14 AM SC: D1 test. Old:111, New:222, Calculated was 123, out of 120 secs.
01-01-1998 01:06:24 AM ET: Program Disconnected event.""")
Синтаксический анализ и запись:
import re
def parseLine(line):
# get the timestamp
ts = re.match(r"\d{2}-\d{2}-\d{4} \d{2}:\d{2}:\d{2} +(?:AM|PM)",line)
# get all but the timestamp - cleaning the double-time issue
cleaned = re.sub(r"^\d{2}-\d{2}-\d{4} (\d{2}:\d{2}:\d{2} (AM|PM) +)+","", line)
# split cleaned part based on occurence of ["D(A)", "D(B)", "D1", "D2"]
if any(k in cleaned.split(":")[1] for k in ["D(A)", "D(B)", "D1", "D2"]):
system, di, msg = cleaned.split(" ", maxsplit = 2)
else:
di = ""
system, msg = cleaned.split(":", maxsplit = 1)
# return each line as list of cleaned stuff:
return [ts[0].strip() ,system.strip(), di.strip(), msg.strip()]
# fixed header, lines will be appended
p = [['Timestamp','System','Di','Message']]
with open("data.txt","r") as r:
for l in r:
l = l.strip()
p.append(parseLine(l))
import csv
with open("c.csv","w",newline="") as w:
writer = csv.writer(w,quoting=csv.QUOTE_ALL)
writer.writerows(p)
Считывание и вывод записанного файла:
with open("c.csv") as r:
print(r.read())
File-Content (mask csv) elsest. Old:111, New:222, Calculated was 123, ...
повредит ваш формат:
"Timestamp","System","Di","Message"
"01-01-1998 00:00:00 AM","GP:","D(B):","1234 to time difference. Hourly Avg:-3 secs"
"01-01-1998 00:00:12 AM","GP:","D(A):","2345 to time difference. Hourly Avg:0 secs"
"01-01-1998 00:08:08 AM","SYS","","The Screen Is now minimised."
"01-01-1998 00:09:10 AM","SC","","Findcorrect: W. D:1. Count one two three four five. #there are somehow some glitch in the system showing 2 timestamp"
"01-01-1998 00:14:14 AM","SC:","D1","test. Old:111, New:222, Calculated was 123, out of 120 secs."
"01-01-1998 01:06:24 AM","ET","","Program Disconnected event."