Я думаю, что это решение должно работать. Это также дает вам вывод, если в строке есть менее 6 слов до / после. Также он соответствует «риску» должным образом и не будет соответствовать чему-то вроде «рискованного».
Вам нужно будет внести некоторые изменения в соответствии с вашим вариантом использования.
from bs4 import BeautifulSoup
import urllib.request
import re
url='https://www.investing.com/analysis/2-reasons-merck-200373488'
req = urllib.request.Request(
url,
data=None,
headers={
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
}
)
sauce = urllib.request.urlopen(req).read()
soup=BeautifulSoup(sauce,'html.parser')
pattern=re.compile(r'risk[\.| ]',re.IGNORECASE)#'Risk', 'risk.', 'risk' but NOT 'risky'
no_of_words=6
for elem in soup(text=pattern):
str=elem.parent.text
list=str.split(' ')
list_indices=[i for i,x in enumerate(list) if re.match(pattern,x.strip()+' ')]# +' ' to conform with our pattern
for index in list_indices:
start=index-no_of_words
end=index+no_of_words+1
if start<0:
start=0
print(' '.join(list[start:end]).strip()) #end will not affect o/p if > len(list)
print("List of Word Before: ",list[start:index])# words before
print("List of Words After: ",list[index+1:end])#word after
print()
выход
Risk Warning
List of Word Before: []
List of Words After: ['Warning']
Risk Disclosure:
List of Word Before: []
List of Words After: ['Disclosure:']
Risk Disclosure: Trading in financial instruments and/or
List of Word Before: []
List of Words After: ['Disclosure:', 'Trading', 'in', 'financial', 'instruments', 'and/or']
cryptocurrencies involves high risks including the risk of losing some, or all, of
List of Word Before: ['cryptocurrencies', 'involves', 'high', 'risks', 'including', 'the']
List of Words After: ['of', 'losing', 'some,', 'or', 'all,', 'of']
investment objectives, level of experience, and risk appetite, and seek professional advice where
List of Word Before: ['investment', 'objectives,', 'level', 'of', 'experience,', 'and']
List of Words After: ['appetite,', 'and', 'seek', 'professional', 'advice', 'where']
investment objectives, level of experience, and risk appetite, and seek professional advice where
List of Word Before: ['investment', 'objectives,', 'level', 'of', 'experience,', 'and']
List of Words After: ['appetite,', 'and', 'seek', 'professional', 'advice', 'where']