Как я могу перейти к следующей части текста, если я уже напечатал искомую часть текста в Python? - PullRequest
2 голосов
/ 04 мая 2011

Я хотел бы выполнить поиск в текстовом файле и распечатать строку и последующие 3 строки, если в строке найдено ключевое слово И в последующих 3 строках найдено другое ключевое слово.

Мой код сейчас печатает слишком много информации. Есть ли способ перейти к следующему разделу текста после того, как часть уже напечатана?

text = """

here is some text 1
I want to print out this line and the following 3 lines only once keyword 2
print this line since it has a keyword2 3
print this line keyword 4
print this line 5
I don't want to print this line but I want to start looking for more text starting at this line 6
Don't print this line 7
Not this line either 8
I want to print out this line again and the following 3 lines only once keyword 9
please print this line keyword 10
please print this line it has the keyword2 11
please print this line 12
Don't print this line 13
Start again searching here 14
etc.
"""

text2 = open("tmp.txt","w")
text2.write(text)
text2.close()

searchlines = open("tmp.txt").readlines()

data = []

for m, line in enumerate(searchlines):
    line = line.lower()
    if "keyword" in line and any("keyword2" in l.lower() for l in searchlines[m:m+4]):
        for line2 in searchlines[m:m+4]:
            data.append(line2)
print ''.join(data)

Вывод прямо сейчас:

I want to print out this line and the following 3 lines only once keyword 2
print this line since it has a keyword2 3
print this line keyword 4
print this line 5
print this line since it has a keyword2 3
print this line keyword 4
print this line 5
I don't want to print this line but I want to start looking for more text starting at this line 6
I want to print out this line again and the following 3 lines only once keyword 9
please print this line keyword 10
please print this line it has the keyword2 11
please print this line 12
please print this line keyword 10
please print this line it has the keyword2 11
please print this line 12
Don't print this line 13
please print this line it has the keyword2 11
please print this line 12
Don't print this line 13
Start again searching here 14

Я бы хотел распечатать только:

I want to print out this line and the following 3 lines only once keyword 2
print this line since it has a keyword2 3
print this line keyword 4
print this line 5
I want to print out this line again and the following 3 lines only once keyword 9
please print this line keyword 10
please print this line it has the keyword2 11
please print this line 12

Ответы [ 4 ]

1 голос
/ 04 мая 2011

Итак, как заметил кто-то еще, ваше первое ключевое слово keyword является подстрокой вашего второго ключевого слова keyword2.Итак, я реализовал это, используя объекты регулярных выражений, так что вы можете использовать привязку границы слова \b.

import re
from StringIO import StringIO

text = """

here is some text 1
I want to print out this line and the following 3 lines only once keyword 2
print this line since it has a keyword2 3
print this line keyword 4
print this line 5
I don't want to print this line but I want to start looking for more text starting at this line 6
Don't print this line 7
Not this line either 8
I want to print out this line again and the following 3 lines only once keyword 9
please print this line keyword 10
please print this line it has the keyword2 11
please print this line 12
Don't print this line 13
Start again searching here 14
etc.
"""


def my_scan(data,search1,search2):
  buffer = []
  for line in data:
    buffer.append(line)
    if len(buffer) > 4:
      buffer.pop(0)
    if len(buffer) == 4: # Valid search block
      if search1.search(buffer[0]) and search2.search("\n".join(buffer[1:3])):
        for item in buffer:
          yield item
        buffer = []

# First search term
s1 = re.compile(r'\bkeyword\b')
s2 = re.compile(r'\bkeyword2\b')

for row in my_scan(StringIO(text),s1,s2):
  print row.rstrip()

Производит:

I want to print out this line and the following 3 lines only once keyword 2
print this line since it has a keyword2 3
print this line keyword 4
print this line 5
I want to print out this line again and the following 3 lines only once keyword 9
please print this line keyword 10
please print this line it has the keyword2 11
please print this line 12
1 голос
/ 04 мая 2011

То есть вы хотите распечатать все блоки из 4 строк, содержащие более 2 ключевых слов?

В любом случае, это то, что я только что придумал.Может быть, вы можете использовать это:

text = """

here is some text 1
I want to print out this line and the following 3 lines only once keyword 2
print this line since it has a keyword2 3
print this line keyword 4
print this line 5
I don't want to print this line but I want to start looking for more text starting at this line 6
Don't print this line 7
Not this line either 8
I want to print out this line again and the following 3 lines only once keyword 9
please print this line keyword 10
please print this line it has the keyword2 11
please print this line 12
Don't print this line 13
Start again searching here 14
etc.
""".splitlines()

keywords = ['keyword', 'keyword2']

buffer, kw = [], set()
for line in text:
    if len(buffer) == 0:                 # first line of a block
        for k in keywords:
            if k in line:
                kw.add(k)
                buffer.append(line)
                continue
    else:                                # continuous lines
        buffer.append(line)
        for k in keywords:
            if k in line:
                kw.add(k)
        if len(buffer) > 3:
            if len(kw) >= 2:             # just print blocks with enough keywords
                print '\n'.join(buffer)
            buffer, kw = [], set()
0 голосов
/ 04 мая 2011

Во-первых, вы можете исправить свой код следующим образом:

text = """
0//
1// here is some text 1
A2// I want to print out this line and the following 3 lines only once keyword 2
b3// print this line since it has a keyword2 3
b4// print this line keyword 4
b5// print this line 5
6// I don't want to print this line but I want to start looking for more text starting at this line 6
7// Don't print this line 7
8// Not this line either 8
A9// I want to print out this line again and the following 3 lines only once keyword 9
b10// please print this line keyword 10
b11// please print this line it has the keyword2 11
b12// please print this line 12
13// Don't print this line 13
14// Start again searching here 14
15// etc.
"""
searchlines = map(str.lower,text.splitlines(1))
# splitlines(1) with argument 1 keeps the newlines

data,again = [],-1

for m, line in enumerate(searchlines):
    if "keyword" in line and m>again and "keyword2" in ''.join(searchlines[m:m+4]):
        data.extend(searchlines[m:m+4])
        again = m+4

print ''.join(data)

.

Во-вторых, короткое регулярное выражение -

text = """
0//
1// here is some text 1
A2// I want to print out this line and the following 3 lines only once keyword 2
b3// print this line since it has a keyword2 3
b4// print this line keyword 4
b5// print this line 5
6// I don't want to print this line but I want to start looking for more text starting at this line 6
7// Don't print this line 7
8// Not this line either 8
A9// I want to print out this line again and the following 3 lines only once keyword 9
b10// please print this line keyword 10
b11// please print this line it has the keyword2 11
b12// please print this line 12
13// Don't print this line 13
14// Start again searching here 14
15// etc.
"""

import re

regx = re.compile('(^.*?(?<=[ \t]){0}(?=[ \t]).*\r?\n'
                  '.*?((?<=[ \t]){1}(?=[ \t]))?.*\r?\n'
                  '.*?((?<=[ \t]){1}(?=[ \t]))?.*\r?\n'
                  '.*?(?(1)|(?(2)|{1})).*)'.\
                  format('keyword','keyword2'),re.MULTILINE|re.IGNORECASE)

print '\n'.join(m.group(1) for m in regx.finditer(text))

результат

A2// I want to print out this line and the following 3 lines only once keyword 2
b3// print this line since it has a keyword2 3
b4// print this line keyword 4
b5// print this line 5
b10// please print this line keyword 10
b11// please print this line it has the keyword2 11
b12// please print this line 12
13// Don't print this line 13
0 голосов
/ 04 мая 2011

Ваши ключевые слова перекрываются: "ключевое слово" является подмножеством "ключевого слова2".

Кроме того, ваши данные означают, что вы не хотите видеть строку 13, но в соотв. к постановке задачи должно быть напечатано.

Я изменил ваше первое ключевое слово с "ключевого слова" на "firstkey", и ваш код работает (кроме строки 13).

$ diff /tmp/q /tmp/q2
4c4
< I want to print out this line and the following 3 lines only once keyword 2
---
> I want to print out this line and the following 3 lines only once firstkey 2
6c6
< print this line keyword 4
---
> print this line firstkey 4
11,12c11,12
< I want to print out this line again and the following 3 lines only once keyword 9
< please print this line keyword 10
---
> I want to print out this line again and the following 3 lines only once firstkey 9
> please print this line firstkey 10
30c30
<     if "keyword" in line and any("keyword2" in l.lower() for l in searchlines[m:m+4]):
---
>     if "firstkey" in line and any("keyword2" in l.lower() for l in searchlines[m:m+4]):
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...