Проблемы в Bs4 и python - PullRequest
0 голосов
/ 29 мая 2018
import requests

from bs4 import BeautifulSoup

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) 
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36'}
url = 'https://edition.cnn.com/'
page = requests.get(url,headers=headers)
soup = BeautifulSoup(page.content,"html.parser")
al = soup.find_all("h3",attrs={'class':'cd__headline'})
for divv in al:
for links in divv.find_all('a'):

    print(links.text)
    print(links.get('href'))

Я пытаюсь извлечь заголовки из cnn.Я предоставляю суп с правильным html-элементом и классом, но вывод пуст, и я не получаю никакой ошибки или трассировки

1 Ответ

0 голосов
/ 30 мая 2018

Веб-страница динамически генерируется из JSON, встроенного в элемент скрипта в HTML.Вы можете извлечь JSON и проанализировать его, чтобы получить необходимые данные или, как вы сказали в своем комментарии выше, использовать Selenium для визуализации JavaScript на странице.Чтобы извлечь JSON:

import requests
import json
from bs4 import BeautifulSoup

url = 'https://edition.cnn.com/'
page = requests.get(url)
soup = BeautifulSoup(page.content,"html.parser")
# Find the script element containging th JSON the web-page is dynamically generated from.
anchor = "var CNN = CNN || {};CNN.isWebview = false;CNN.contentModel = "
s = soup.find(lambda tag:tag.name=="script" and anchor in tag.text)
# Extract the JSON.
j = s.text[s.text.find("articleList")-2:s.text.find("}]")+4]
# Load the JSON.
d = json.loads(j)
# Read the headline from the JSON.
for article in d['articleList']:
    print ( article['headline'])

Выходы:

Here's how the show's cast reacted to the rant
Wanda Sykes quit show before it was cancelled
ABC took a moral stand on Roseanne. Spoiler alert: Trump won't.
<strong>Your questions on the 'Spider-Man' photo, answered</strong>
Trump, without proof, says Mueller team will meddle in 2018 elections
Trump wins by demonizing Mueller
2 police officers, passerby killed in Belgium
MH370 search ends but mystery remains
Israel responds to Gaza fire with airstrikes
French Open: Serena, Sharapova win
Duterte will 'go to war' over South China Sea
Giuliani gets booed on his birthday
<strong>Childhood obesity highest in home of Mediterranean diet</strong>
Top North Korea official heading to US to revive Trump talks
Suspected serial killer ID'd, but cops 'can't arrest him'
Pre-monsoon storms kill 48 in India
Lava 'river' engulfs home in minutes
Mugabe warned: Be at hearing or face jail 
Why supersonic air travel could boom in Asia 
'Unbreakable:' How tennis star Jelena Dokic overcame 'years of abuse' 
This guy survived Vesuvius eruption -- but not for long
Best travel photos of 2018
Online dating 'lowers self-esteem and increases depression'
Who is North Korea's go-to diplomat?
The best cities for swimming
Vatican unveils radical chapels
Why this country has the best libraries
The architect that changed our cities
<strong>Jill Filipovic:</strong> French Spider-Man's act of bravery you don't know about
<strong>Silvia Marchetti:</strong> Italy's chaos is more dangerous than Brexit
<strong>Jesse Williams and Judith Browne Dianis:</strong> Starbucks' incident proves 'Whites Only' spaces still exist 
<strong>Perez and O'Leary Carmona:</strong> How Trump is dehumanizing Latinos
Moment man climbs building to save child
Flash floods ravage US town 
See North Korea's nuclear tunnels go up in smoke
Meghan laughs off Harry's bee encounter
Blue flames burn during Kilauea eruption
Footage of NBA player's arrest released
Why Dubai is hungry for food delivery apps
Paris in spring? Must be Rafa Nadal time 
Fore! Golfers ignore erupting volcano
Take a tour of the Russia World Cup stadiums
Rugby World Cup 2019 Japan venues
Gorgeous Vietnam: Take a photo tour
Breathtaking architecture found underwater
India's problem with rape: Do women feel safe? 
Afghan who risked life for UK: 'They are sending me to get killed'
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...