Я собираю цитаты из Твиттера, и из этих цитат я хочу отделить реальную цитату от ее автора.
Как мне это сделать, если твиты не имеют одинакового формата?
Я новичок в regex, но вот моя лучшая попытка в regex101 https://regex101.com/r/m3WtmX/5.
И ниже код, который я имею, я хочу, чтобы каждый цикл печатал sre.SRE_Match object
, но последний печатает None
.
import re
QUOTE_PATTERN = re.compile(r'^(?P<actual_quote>.*)\s+?-\s*(?P<author>.*)$')
# actual_quote is separated from author by space and dash
format_1 = "Any form of exercise, if pursued continuously, will help train us in perseverance -Mao Tse-Tung"
# separated by one space, dash and another space
format_2 = "Any form of exercise, if pursued continuously, will help train us in perseverance - Mao Tse-Tung"
# actual_quote is surrounded with double quotes character and
# is separated from author by space, dash and another space
format_3 = '"Any form of exercise, if pursued continuously, will help train us in perseverance" - Mao Tse-Tung'
# separated only with dash (no space)
format_4 = "Any form of exercise, if pursued continuously, will help train us in perseverance-Mao Tse-Tung"
for format in [format_1, format_2, format_3, format_4]:
print(QUOTE_PATTERN.match(format))