стоит попробовать автоматизацию библиотеки селена.он позволяет отбирать данные страницы запроса динамического рендеринга (js или ajax).
from selenium import webdriver
from bs4 import BeautifulSoup
import time
from bs4.element import Tag
driver = webdriver.Chrome('/usr/bin/chromedriver')
google_url = "https://www.google.com/search?q=python" + "&num=" + str(5)
driver.get(google_url)
time.sleep(3)
soup = BeautifulSoup(driver.page_source,'lxml')
result_div = soup.find_all('div', attrs={'class': 'g'})
links = []
titles = []
descriptions = []
for r in result_div:
# Checks if each element is present, else, raise exception
try:
link = r.find('a', href=True)
title = None
title = r.find('h3')
if isinstance(title,Tag):
title = title.get_text()
description = None
description = r.find('span', attrs={'class': 'st'})
if isinstance(description, Tag):
description = description.get_text()
# Check to make sure everything is present before appending
if link != '' and title != '' and description != '':
links.append(link['href'])
titles.append(title)
descriptions.append(description)
# Next loop if one element is not present
except Exception as e:
print(e)
continue
print(titles)
print(links)
print(descriptions)
O / P:
['Welcome to Python.org', 'Download Python | Python.org', 'Python Tutorial - W3Schools', 'Introduction to Python - W3Schools', 'Python Programming Language - GeeksforGeeks', 'Python: 7 Important Reasons Why You Should Use Python - Medium', 'Python: 7 Important Reasons Why You Should Use Python - Medium', 'Python Tutorial - Tutorialspoint', 'Python Download and Installation Instructions', 'Python vs C++ - Find Out The 9 Important Differences - eduCBA', None, 'Description']
['https://www.python.org/', 'https://www.python.org/downloads/', 'https://www.w3schools.com/python/', 'https://www.w3schools.com/python/python_intro.asp', 'https://www.geeksforgeeks.org/python-programming-language/', 'https://medium.com/@mindfiresolutions.usa/python-7-important-reasons-why-you-should-use-python-5801a98a0d0b', 'https://medium.com/@mindfiresolutions.usa/python-7-important-reasons-why-you-should-use-python-5801a98a0d0b', 'https://www.tutorialspoint.com/python/', 'https://www.ics.uci.edu/~pattis/common/handouts/pythoneclipsejava/python.html', 'https://www.educba.com/python-vs-c-plus-plus/', '/search?num=5&q=Python&stick=H4sIAAAAAAAAAONgFuLQz9U3MK0yjFeCs7SEs5Ot9JPzc3Pz86yKM1NSyxMri1cxsqVZOQZ4Fi9iZQuoLMnIzwMAlVPV1j0AAAA&sa=X&ved=2ahUKEwigvcqKx8XiAhUOSX0KHdtmBgoQzTooADAQegQIChAC', 'mailto:?body=Python%20https%3A%2F%2Fwww.google.com%2Fsearch%3Fkgmid%3D%2Fm%2F05z1_%26hl%3Den-IN%26kgs%3De1764a9f31831e11%26q%3DPython%26shndl%3D0%26source%3Dsh%2Fx%2Fkp%26entrypoint%3Dsh%2Fx%2Fkp']
['The official home of the Python Programming Language.', 'Looking for Python 2.7? See below for specific releases. Contribute to the PSF by Purchasing a PyCharm License. All proceeds benefit the PSF. Donate Now\xa0...', 'Python can be used on a server to create web applications. ... Our "Show Python" tool makes it easy to learn Python, it shows both the code and the result.', 'What is Python? Python is a popular programming language. It was created by Guido van Rossum, and released in 1991. It is used for: web development\xa0...', 'Python is a widely used general-purpose, high level programming language. It was initially designed by Guido van Rossum in 1991 and developed by Python\xa0...', None, None, None, None, None, None, None]
, где '/usr/bin/chromedriver'
путь к веб-драйверу селена.
Загрузка веб-драйвера selenium для браузера Chrome:
http://chromedriver.chromium.org/downloads
Установка веб-драйвера для браузера Chrome:
https://christopher.su/2015/selenium-chromedriver-ubuntu/
Учебник Selenium:
https://selenium -python.readthedocs.io /