Я пытаюсь научиться BeautifulSoup, чтобы вырезать текст из политических статей NYT, в настоящее время с кодом, который у меня есть прямо сейчас, ему удается пролистать два абзаца, но затем, после этого, он выплевывает AttributeError: 'NoneType' У объекта нет атрибута 'get_text'. Я посмотрел на эту ошибку, и некоторые потоки утверждают, что ошибка происходит из-за использования устаревших функций beautifulsoup3. Но это не проблема, идеи?
Код:
import requests
from urllib import request, response, error, parse
from urllib.request import urlopen
from bs4 import BeautifulSoup
url = "https://www.nytimes.com/2020/02/10/us/politics/trump-manchin-impeachment.html"
html = urlopen(url)
soup = BeautifulSoup(html, "html.parser")
title = soup.title
titleText = title.get_text()
body = soup.find('article', class_='css-1vxca1d')
section = soup.find('section', class_="css-1r7ky0e")
for elem in section:
div1 = elem.findAll('div')
for x in div1:
div2 = elem.findAll('div')
for i in div2:
text = i.find('p').get_text()
print (text)
print("----------")
вывод:
WASHINGTON — Senator Joe Manchin III votes with President Trump more than any other Democrat in the Senate. But his vote last week to convict Mr. Trump of impeachable offenses has eclipsed all of that, earning him the rage of a president who coveted a bipartisan acquittal.
----------
“Munchkin means that you’re small, right?” he said. “I’m bigger than him — of course he has me by weight, now, he has more volume than I have by about 30 or 40 pounds. I’m far from being weak and pathetic, and I’m far from being a munchkin, and I still want him to succeed as president of the United States.”
----------
Traceback (most recent call last):
File "/Users/user/PycharmProjects/project2/webscrapper.py", line 25, in <module>
text = i.find('p').get_text()
AttributeError: 'NoneType' object has no attribute 'get_text'
Process finished with exit code 1