Преобразовать файл log.txt в файл JSON - PullRequest
0 голосов
/ 05 июля 2018

Я должен преобразовать файл журнала в файл json, чтобы обучать неконтролируемую модель. Файл журнала в формате -

40.77.167.191, 172.16.30.15 - - [08/May/2018:03:29:15 +0530] "GET /speedwav-full-chrome-side-beading-for-tata-indigo-cs-46901.html HTTP/1.1" 403 162 <0.000> <-> "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
66.249.79.25, 172.16.30.15 - - [08/May/2018:03:29:17 +0530] "GET /schneider-dc-control-relays-ca4kn31-t008000721.html HTTP/1.1" 200 14443 <0.445> <0.445> "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.79.25, 172.16.30.15 - - [08/May/2018:03:29:19 +0530] "GET /ajax/pdp/recentlyviewed/1184932 HTTP/1.1" 200 2 <0.089> <0.089> "https://www.tolexo.com/orient-18w-eternal-surface-panel-square-led-light-18w01-t14ori0043.html" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

Я хочу получить файл в формате -

40.77.167.191, 172.16.30.15 - - [08/May/2018:03:29:15 +0530] "GET /speedwav-full-chrome-side-beading-for-tata-indigo-cs-46901.html HTTP/1.1" 403 162 <0.000> <-> "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"

66.249.79.25, 172.16.30.15 - - [08/May/2018:03:29:17 +0530] "GET /schneider-dc-control-relays-ca4kn31-t008000721.html HTTP/1.1" 200 14443 <0.445> <0.445> "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

66.249.79.25, 172.16.30.15 - - [08/May/2018:03:29:19 +0530] "GET /ajax/pdp/recentlyviewed/1184932 HTTP/1.1" 200 2 <0.089> <0.089> "https://www.tolexo.com/orient-18w-eternal-surface-panel-square-led-light-18w01-t14ori0043.html" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

, а затем создайте для него файл json.

1 Ответ

0 голосов
/ 05 июля 2018

Использование re.split

Ex:

import re

s = """40.77.167.191, 172.16.30.15 - - [08/May/2018:03:29:15 +0530] "GET /speedwav-full-chrome-side-beading-for-tata-indigo-cs-46901.html HTTP/1.1" 403 162 <0.000> <-> "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" 66.249.79.25, 172.16.30.15 - - [08/May/2018:03:29:17 +0530] "GET /schneider-dc-control-relays-ca4kn31-t008000721.html HTTP/1.1" 200 14443 <0.445> <0.445> "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.79.25, 172.16.30.15 - - [08/May/2018:03:29:19 +0530] "GET /ajax/pdp/recentlyviewed/1184932 HTTP/1.1" 200 2 <0.089> <0.089> "https://www.tolexo.com/orient-18w-eternal-surface-panel-square-led-light-18w01-t14ori0043.html" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"""
val = re.split(r"(\d+\.\d+\.\d+\.\d+, \d+\.\d+\.\d+\.\d+)", s)[1:]
for v, w in zip(val[::2], val[1::2]):
    print(v, w)

Выход:

('40.77.167.191, 172.16.30.15', ' - - [08/May/2018:03:29:15 +0530] "GET /speedwav-full-chrome-side-beading-for-tata-indigo-cs-46901.html HTTP/1.1" 403 162 <0.000> <-> "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" ')
('66.249.79.25, 172.16.30.15', ' - - [08/May/2018:03:29:17 +0530] "GET /schneider-dc-control-relays-ca4kn31-t008000721.html HTTP/1.1" 200 14443 <0.445> <0.445> "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" ')
('66.249.79.25, 172.16.30.15', ' - - [08/May/2018:03:29:19 +0530] "GET /ajax/pdp/recentlyviewed/1184932 HTTP/1.1" 200 2 <0.089> <0.089> "https://www.tolexo.com/orient-18w-eternal-surface-panel-square-led-light-18w01-t14ori0043.html" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)')
...