Очистите гиперссылку внутри href с помощью BeautifulSoup + Python - PullRequest
1 голос
/ 06 марта 2020

Я хотел бы почистить сайт человека и ссылки на блог на https://lawyers.justia.com/lawyer/robin-d-gross-39828.

. У меня пока есть:

if soup.find('div', attrs={'class': "heading-3 block-title iconed-heading font-w-bold"}) is not None:
    webs = soup.find('div', attrs={'class': "heading-3 block-title iconed-heading font-w-bold"}) 
    print(webs.findAll("href"))

1 Ответ

0 голосов
/ 06 марта 2020
from bs4 import BeautifulSoup
import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:73.0) Gecko/20100101 Firefox/73.0'}

r = requests.get(
    "https://lawyers.justia.com/lawyer/robin-d-gross-39828", headers=headers)
soup = BeautifulSoup(r.text, 'html.parser')

for item in soup.findAll("a", {'data-vars-action': ['ProfileWebsite', 'ProfileBlogPost']}):
    print(item.get("href"))

Выход:

http://www.imaginelaw.com/
http://www.imaginelaw.com/lawyer-attorney-1181486.html
http://www.ipjustice.org/internet-governance/icann-accountability-deficits-revealed-in-panel-ruling-on-africa/
http://www.circleid.com/members/5382
http://www.circleid.com/posts/20160301_icann_accountability_proposal_power_of_governments_over_internet
http://www.circleid.com/posts/20151201_supporting_orgs_marginalized_in_icann_accountability_proposal
http://www.circleid.com/posts/20150720_icann_accountability_deficits_revealed_in_panel_ruling_on_africa
http://www.circleid.com/posts/20150401_freedom_of_expression_chilled_by_icann_addition_of_speech
http://www.circleid.com/posts/20150203_proposal_for_creation_of_community_veto_for_key_icann_decisions
http://www.circleid.com/posts/20150106_civil_society_cautions_icann_giving_governments_veto_geo_domains
http://www.circleid.com/posts/20140829_radical_shift_of_power_proposed_at_icann_govts_in_primary_role
http://www.circleid.com/posts/20140821_icanns_accountability_plan_gives_icann_board_total_control
http://www.circleid.com/posts/20140427_a_civil_society_perspective_on_netmundial_final_outcome_document
https://imaginelaw.wordpress.com
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...