Я пытаюсь прочитать весь текст с веб-страницы, но получаю только скрытый текст. На странице, которую я пытаюсь прочитать, есть кнопка действия «Подробнее», скрывающая часть текста.
<button type="submit" class="ActionButtonComponent-action link dark" ink-ripple="" ng-class="[$ctrl.type, $ctrl.theme, $ctrl.loading ? '__loading' : '']" ng-click="$ctrl.callback($event)" ng-disabled="$ctrl.inactive || $ctrl.loading"><span class="ActionButtonComponent-action-txt" translate="$ctrl.translateKey">Mehr lesen</span><ng-transclude></ng-transclude><div class="ActionButtonComponent-action-loading"></div><div class="ink-ripple"></div></button>
<span class="ActionButtonComponent-action-txt" translate="$ctrl.translateKey">Mehr lesen</span>
<ng-transclude></ng-transclude>
<div class="ActionButtonComponent-action-loading"></div>
<div class="ink-ripple"></div>
<button type="submit" class="ActionButtonComponent-action link dark" ink-ripple="" ng-class="[$ctrl.type, $ctrl.theme, $ctrl.loading ? '__loading' : '']" ng-click="$ctrl.callback($event)" ng-disabled="$ctrl.inactive || $ctrl.loading"><span class="ActionButtonComponent-action-txt" translate="$ctrl.translateKey">Mehr lesen</span><ng-transclude></ng-transclude><div class="ActionButtonComponent-action-loading"></div><div class="ink-ripple"></div></button>
<action-button type="'link'" action="$ctrl.toggleDescription()" translate-key="$ctrl.showFullDescription ? 'COMPONENT.SEO_PAGE.LESS' : 'COMPONENT.SEO_PAGE.MORE'"><button type="submit" class="ActionButtonComponent-action link dark" ink-ripple="" ng-class="[$ctrl.type, $ctrl.theme, $ctrl.loading ? '__loading' : '']" ng-click="$ctrl.callback($event)" ng-disabled="$ctrl.inactive || $ctrl.loading"><span class="ActionButtonComponent-action-txt" translate="$ctrl.translateKey">Mehr lesen</span><ng-transclude></ng-transclude><div class="ActionButtonComponent-action-loading"></div><div class="ink-ripple"></div></button></action-button>
Код, который я использую для чтения:
url = "url_to_read"
headers = {'user-agent': "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.98 Safari/537.36"}
keys_folder = "Keys"
excel_file = "excel_file.xlsx"
def getHTML(url):
full_html = requests.get(url, headers=headers).text
soup = BeautifulSoup(full_html,features="lxml")
for a in soup.findAll('a'):
del a["href"]
return soup.text[2500:]
def getKeyWords(excel_file):
df = pd.read_excel(keys_folder + "\\" + excel_file)
return df["Query"]
def clean(paragraphs):
pars = []
for p in paragraphs:
p = p.replace("<p>","")
p = p.replace("</p>","")
pars.append(p)
return pars
def freq(html, key_words):
kv = []
for s in key_words:
s += " "
a = {s : html.lower().count(s.lower())}
kv.append(a)
return kv
key_words = getKeyWords(excel_file)
html = getHTML(url)
freqs = freq(html, key_words)
result = Counter()
for elem in freqs:
for key, value in elem.items():
result[key] += value
df = pd.DataFrame(result.items(), columns = ["Query", "Count"])
df.to_excel("Results\\Result " + excel_file[:-5] + ".xlsx")
print(df)
Может кто-то помочь мне с этим?