Я предлагаю разбить текст на предложения с помощью NLTK , а затем проверить, присутствует ли строка в каждом элементе или нет.
import nltk, re
text = "Cash taxes paid, net of refunds, were $412 million 2016. The U.S. Tax Act imposed a mandatory one-time tax on accumulated earnings of foreign subsidiaries and changed how foreign earnings are subject to U.S. tax."
sentences = nltk.sent_tokenize(text)
word = "subsidiaries"
print([sent for sent in sentences if word in sent])
# => ['The U.S. Tax Act imposed a mandatory one-time tax on accumulated earnings of foreign subsidiaries and changed how foreign earnings are subject to U.S. tax.']
Чтобы извлечь только утвердительные предложения (заканчивающиеся на .
), добавьте and sent.endswith('.')
условие:
print([sent for sent in sentences if word in sent and sent.endswith('.')])
Вы даже можете проверить, является ли слово, по которому вы фильтруете, целым словомпоиск по регулярному выражению:
print([sent for sent in sentences if re.search(r'\b{}\b'.format(word), sent)])
# => ['The U.S. Tax Act imposed a mandatory one-time tax on accumulated earnings of foreign subsidiaries and changed how foreign earnings are subject to U.S. tax.']