Как использовать команду оболочки «sed», чтобы скрыть информацию, указанную определенным символом - PullRequest
0 голосов
/ 20 апреля 2020

Я пытаюсь написать команду оболочки «sed» или «grep», чтобы скрыть информацию, за которой следует «Scraped from» с одним «*».

Например, файл примера имеет:

2016-12-09 18:57:32 [scrapy.core.engine] INFO: Spider opened
2016-12-09 18:57:32 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2016-12-09 18:57:32 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2016-12-09 18:57:32 [scrapy.core.engine] DEBUG: Crawled (404) <GET http://quotes.toscrape.com/robots.txt> (referer: None)
2016-12-09 18:57:32 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/> (referer: None)
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/>
{'text': '“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”', 'tags': ['change', 'deep-thoughts', 'thinking', 'world'], 'author': 'Albert Einstein'}
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/>
{'text': '“It is our choices, Harry, that show what we truly are, far more than our abilities.”', 'tags': ['abilities', 'choices'], 'author': 'J.K. Rowling'}
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/>
{'text': '“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”', 'tags': ['inspirational', 'life', 'live', 'miracle', 'miracles'], 'author': 'Albert Einstein'}
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/>
{'text': '“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”', 'tags': ['aliteracy', 'books', 'classic', 'humor'], 'author': 'Jane Austen'}
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/>
{'text': "“Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”", 'tags': ['be-yourself', 'inspirational'], 'author': 'Marilyn Monroe'}
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/>
{'text': '“Try not to become a man of success. Rather become a man of value.”', 'tags': ['adulthood', 'success', 'value'], 'author': 'Albert Einstein'}
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/>

Вывод должен иметь:

2016-12-09 18:57:32 [scrapy.core.engine] INFO: Spider opened
2016-12-09 18:57:32 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2016-12-09 18:57:32 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2016-12-09 18:57:32 [scrapy.core.engine] DEBUG: Crawled (404) <GET http://quotes.toscrape.com/robots.txt> (referer: None)
2016-12-09 18:57:32 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/> (referer: None)
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from *
{'text': '“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”', 'tags': ['change', 'deep-thoughts', 'thinking', 'world'], 'author': 'Albert Einstein'}
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from *
{'text': '“It is our choices, Harry, that show what we truly are, far more than our abilities.”', 'tags': ['abilities', 'choices'], 'author': 'J.K. Rowling'}
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from *
{'text': '“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”', 'tags': ['inspirational', 'life', 'live', 'miracle', 'miracles'], 'author': 'Albert Einstein'}
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from *
{'text': '“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”', 'tags': ['aliteracy', 'books', 'classic', 'humor'], 'author': 'Jane Austen'}
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from *
{'text': "“Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”", 'tags': ['be-yourself', 'inspirational'], 'author': 'Marilyn Monroe'}
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from *
{'text': '“Try not to become a man of success. Rather become a man of value.”', 'tags': ['adulthood', 'success', 'value'], 'author': 'Albert Einstein'}
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from *

Я знаю, что вы можете использовать sed 's / bla / BLA / g 'сделать замену, но в моем случае мне нужно заменить информацию, за которой следует определенный символ. И я не уверен, как мне это сделать.

Ответы [ 2 ]

1 голос
/ 20 апреля 2020

для запутывания информации, за которой следует «Scraped from» с одним «*».

Поэтому просто замените все, за которым следует «Screped from», одним *:

sed 's/Scraped from .*/Scraped from */'
0 голосов
/ 20 апреля 2020

Вот решение, которое сохранит знак препинания (или его отсутствие) после ключевого слова from. Также при условии, что вам нужно это изменение только после ключевой фразы Scraped from, а не «любой» from.

sed -E 's/(Scraped from[:=]?).*/\1 */g' sample_file

Работа с двумя предложениями from в одной строке немного сложнее. Вот один из способов сделать это.

Пример файла (упрощенно):

cat sample_file

2016-12-09 [scrapy.core.engine] INFO: Spider opened
2016-12-09 [scrapy.logstats] INFO: Scraped 0 items (at 0 items/min)
2016-12-09 [scrapy.ext.telnet] DEBUG: Telnet listening on 127.0.0.1:6023
2016-12-09 [scrapy] DEBUG: Crawled (200) <GET http://quotes.com/> (ref: None)
2016-12-09 [scrapy] DEBUG: Scraped from= <200 http://quotes.toscrape.com/>
2016-12-09 [scrapy] DEBUG: Scraped from <200 http://quotes.toscrape.com/>
2016-12-09 [scrapy] DEBUG: Scraped from: <200 http://first/> and from: me.org
2016-12-09 [scrapy.core.scraper] DEBUG: Scraped from <3 http://toscrape.com/>

Решение и вывод:

sed -E 's/(Scraped from[:=]?) .*and from/\1 * and from/;
        s/(Scraped( from[:=]? \* and)? from[:=]?).*$/\1 */' sample_file

2016-12-09 [scrapy.core.engine] INFO: Spider opened
2016-12-09 [scrapy.logstats] INFO: Scraped 0 items (at 0 items/min)
2016-12-09 [scrapy.ext.telnet] DEBUG: Telnet listening on 127.0.0.1:6023
2016-12-09 [scrapy] DEBUG: Crawled (200) <GET http://quotes.com/> (ref: None)
2016-12-09 [scrapy] DEBUG: Scraped from= *
2016-12-09 [scrapy] DEBUG: Scraped from *
2016-12-09 [scrapy] DEBUG: Scraped from: * and from: *
2016-12-09 [scrapy.core.scraper] DEBUG: Scraped from *
...