link = i.find('a',href=True)
всегда не возвращает anchor tag (a)
, может быть возвращено NoneType
, поэтому необходимо проверить, что ссылка отсутствует, продолжить цикл, иначе получить значение ссылки href.
Ссылка на лом по URL:
import re
import requests
from bs4 import BeautifulSoup
r = requests.get("https://www.example.com/blog/author/abc")
data = r.content # Content of response
soup = BeautifulSoup(data, "html.parser")
for i in soup.find_all('div',{'class':'post-info-wrap'}):
link = i.find('a',href=True)
if link is None:
continue
print(link['href'])
Записать ссылку по HTML:
from bs4 import BeautifulSoup
html = '''<div class="post-info-wrap"><h2 class="post-title"><a href="https://www.example.com/blog/111/this-is-1st-post/" title="Example of 1st post – Example 1 Post" rel="bookmark">sample post – example 1 post</a></h2><div class="post-meta clearfix">
<div class="post-info-wrap"><h2 class="post-title"><a href="https://www.example.com/blog/111/this-is-2nd-post/" title="Example of 2nd post – Example 2 Post" rel="bookmark">sample post – example 2 post</a></h2><div class="post-meta clearfix">'''
soup = BeautifulSoup(html, "html.parser")
for i in soup.find_all('div',{'class':'post-info-wrap'}):
link = i.find('a',href=True)
if link is None:
continue
print(link['href'])
Обновление:
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome('/usr/bin/chromedriver')
driver.get("https://www.example.com/blog/author/abc")
soup = BeautifulSoup(driver.page_source, "html.parser")
for i in soup.find_all('div', {'class': 'post-info-wrap'}):
link = i.find('a', href=True)
if link is None:
continue
print(link['href'])
O / P: * * тысяча двадцать-одна
https://www.example.com/blog/911/article-1/
https://www.example.com/blog/911/article-2/
https://www.example.com/blog/911/article-3/
https://www.example.com/blog/911/article-4/
https://www.example.com/blog/random-blog/article-5/
Для браузера Chrome:
http://chromedriver.chromium.org/downloads
Установить веб-драйвер для браузера Chrome:
https://christopher.su/2015/selenium-chromedriver-ubuntu/
учебник по селену
https://selenium -python.readthedocs.io /
Где '/usr/bin/chromedriver'
Путь Chrome Webdriver.