Question

С учетом тестовой строки:

teststr= 'chapter 1 Here is a block of text from chapter one.  chapter 2 Here is another block of text from the second chapter.  chapter 3 Here is the third and final block of text.'

Я хочу создать список результатов, подобный этому:

result=['chapter 1 Here is a block of text from chapter one.','chapter 2 Here is another block of text from the second chapter.','chapter 3 Here is the third and final block of text.']

Используя re.findall('chapter [0-9]',teststr)

Я получаю ['chapter 1', 'chapter 2', 'chapter 3']

Хорошо, если бы все, что я хотел, были номера глав, но я хочу, чтобы номер главы плюс весь текст до номера следующей главы. В случае с последней главой я хочу получить номер главы и текст до конца.

Попытка re.findall('chapter [0-9].*',teststr) дает жадный результат: ['chapter 1 Here is a block of text from chapter one. chapter 2 Here is another block of text from the second chapter. chapter 3 Here is the third and final block of text.']

I Я не очень хорошо с регулярными выражениями, поэтому любая помощь будет оценена.

Wiktor Stribiżew · Answer 1 · 13 марта 2020

Вы можете использовать

import re
teststr= 'chapter 1 Here is a block of text from chapter one.  chapter 2 Here is another block of text from the second chapter.  chapter 3 Here is the third and final block of text.'
my_result = [x.strip() for x in re.split(r'(?!^)(?=chapter \d)', teststr)]
print( my_result )
# => ['chapter 1 Here is a block of text from chapter one.', 'chapter 2 Here is another block of text from the second chapter.', 'chapter 3 Here is the third and final block of text.']

См. Python демо . Регулярное выражение (?!^)(?=chapter \d) означает:

(?!^) - найти местоположение, которое не находится в начале строки, и за
(?=chapter \d) - сразу следует chapter, space и any di git.

Шаблон используется для разбиения строки в найденных местоположениях и не использует никаких символов, поэтому результаты удаляются из пробелов в понимании списка.

Ed Ward · Answer 2 · 12 марта 2020

Если вам не нужно использовать регулярное выражение, попробуйте следующее:

def split(text):
    chapters = []

    this_chapter = ""
    for i, c in enumerate(text):
        if text[i:].startswith("chapter ") and text[i+8].isdigit():
            if this_chapter.strip():
                chapters.append(this_chapter.strip())
            this_chapter = c
        else:
            this_chapter += c

    chapters.append(this_chapter.strip())

    return chapters

print(split('chapter 1 Here is a block of text from chapter one.  chapter 2 Here is another block of text from the second chapter.  chapter 3 Here is the third and final block of text.'))

Вывод:

['chapter 1 Here is a block of text from chapter one.', 'chapter 2 Here is another block of text from the second chapter.', 'chapter 3 Here is the third and final block of text.']

Juan C · Answer 3 · 12 марта 2020

Вы ищете re.split. Предполагая до 99 глав:

import re
teststr= 'chapter 1 Here is a block of text from chapter one.  chapter 2 Here is another block of text from the second chapter.  chapter 3 Here is the third and final block of text.'

chapters = [i.strip() for i in re.split('chapter \d{1,2}', teststr)[1:]]

Вывод:

['Here is a block of text from chapter one.',
 'Here is another block of text from the second chapter.',
 'Here is the third and final block of text.']

В Python как извлечь несколько блоков текста, которые начинаются с одного и того же шаблона, но не имеют четкого конца?

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

Ответы [ 3 ]

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Вывод:

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

В Python как извлечь несколько блоков текста, которые начинаются с одного и того же шаблона, но не имеют четкого конца?

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

Ответы [ 3 ]

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Вывод:

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Похожие темы