Возможно, вам нужно отобразить текст из следующего элемента span
.Это можно сделать следующим образом:
import requests
from bs4 import BeautifulSoup
def beautiful_soup(url):
'''DEFINING THE FUNCTION HERE THAT SENDS A REQUEST AND PRETTIFIES THE TEXT
INTO SOMETHING THAT IS EASY TO READ'''
request = requests.get(url)
soup = BeautifulSoup(request.text, "lxml")
#print(soup.prettify())
return soup
soup = beautiful_soup('https://news.google.com/?hl=en-IN&gl=IN&ceid=IN:en')
for headlines in soup.find_all('a', {'class': 'VDXfz'}):
print(headlines.find_next('span').text)
Это даст вам вывод, начиная что-то вроде:
I Take Back My Comment, Says Ram Madhav After Omar Abdullah’s Dare to Prove Pakistan Charge
Ram Madhav Backpedals On "Instruction From Pak" After Omar Abdullah Dare
National Conference backed PDP to save J&K from uncertainty: Omar Abdullah
On Ram Madhav ‘instruction from Pak’ barb, Omar Abdullah’s stinging reply
Make public reports of horse-trading in govt formation in J-K: Omar Abdullah to Guv
Вы можете записать заголовки в файл в формате CSV, используя следующиеподход:
import requests
from bs4 import BeautifulSoup
import csv
def beautiful_soup(url):
'''DEFINING THE FUNCTION HERE THAT SENDS A REQUEST AND PRETTIFIES THE TEXT
INTO SOMETHING THAT IS EASY TO READ'''
request = requests.get(url)
soup = BeautifulSoup(request.text, "lxml")
return soup
soup = beautiful_soup('https://news.google.com/?hl=en-IN&gl=IN&ceid=IN:en')
with open('output.csv', 'w', newline='', encoding='utf-8') as f_output:
csv_output = csv.writer(f_output)
csv_output.writerow(['Headline'])
for headlines in soup.find_all('a', {'class': 'VDXfz'}):
headline = headlines.find_next('span').text
print(headline)
csv_output.writerow([headline])
В настоящее время это просто производит один столбец с именем Headline