Получить указанные c ссылки из списка - PullRequest
0 голосов
/ 01 апреля 2020

Я хочу напечатать текст ВТОРОЙ метки в первых 5 «группировках» и выбрать первую во второй пятерке. Как бы я это сделал?

https://www.carehome.co.uk/care_search_results.cfm/searchunitary/Tower-Hamlets

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
from selenium import webdriver

#grabspage and parses it through ready for picking apart
my_url = "https://www.carehome.co.uk/care_search_results.cfm/searchunitary/Tower-Hamlets"

driver = webdriver.Chrome(executable_path='C:/Users/lemonade/Documents/work/chromedriver')
driver.get(my_url)
page_s = soup(driver.page_source, features='html.parser')

#Finds relvant divs
containers = page_s.findAll("div", {"class": "home-name"})


for container in containers:
    name_container = container.p
    all_a = name_container.findAll("a")
    print(all_a)

ВЫХОД:

[<a name="member_21310"></a>, <a href="https://www.carehome.co.uk/carehome.cfm/searchazref/20001005SILA" style="font-weight:bold;font-size:28px">Silk Court</a>]
[<a name="member_35665"></a>, <a href="https://www.carehome.co.uk/carehome.cfm/searchazref/10001005FITA" style="">Westport Care Home</a>]
[<a name="member_34393"></a>, <a href="https://www.carehome.co.uk/carehome.cfm/searchazref/20001005ASPA" style="font-weight:bold;font-size:28px">Aspen Court Care Home</a>]
[<a name="member_4936"></a>, <a href="https://www.carehome.co.uk/carehome.cfm/searchazref/10001005SYDA" style="">Beaumont Court</a>]
[<a name="member_40189"></a>, <a href="https://www.carehome.co.uk/carehome.cfm/searchazref/20001005HAWA" style="font-weight:bold;font-size:28px">Hawthorn Green Residential and Nursing Home</a>]
[<a href="https://www.carehome.co.uk/carehome.cfm/searchazref/10001005COXA">
                                Coxley House
                            </a>]
[<a href="https://www.carehome.co.uk/carehome.cfm/searchazref/85852">
                                Toby Lodge
                            </a>]
[<a href="https://www.carehome.co.uk/carehome.cfm/searchazref/10001005HOTA">
                                Hotel in the Park
                            </a>]
[<a href="https://www.carehome.co.uk/carehome.cfm/searchazref/10001005RETB">
                                34/35 Huddleston Close
                            </a>]
[<a href="https://www.carehome.co.uk/carehome.cfm/searchazref/10001005APPA">
                                Approach Lodge
                            </a>]

Ответы [ 2 ]

0 голосов
/ 01 апреля 2020

Вы можете сделать это с помощью css селектора select(), который вернет ожидаемый результат.

from bs4 import BeautifulSoup as soup
from selenium import webdriver

#grabspage and parses it through ready for picking apart
my_url = "https://www.carehome.co.uk/care_search_results.cfm/searchunitary/Tower-Hamlets"

driver = webdriver.Chrome(executable_path='C:/Users/lemonade/Documents/work/chromedriver')
driver.get(my_url)
page_s = soup(driver.page_source, features='html.parser')
containers = page_s.select("div.home-name>p>a[href]")

for container in containers:
    print(container.text.strip())

Выход :

Silk Court
Westport Care Home
Aspen Court Care Home
Beaumont Court
Hawthorn Green Residential and Nursing Home
Coxley House
Toby Lodge
Hotel in the Park
34/35 Huddleston Close
Approach Lodge
0 голосов
/ 01 апреля 2020

Можете ли вы попробовать следующее решение:

driver.get ("https://www.carehome.co.uk/care_search_results.cfm/searchunitary/Tower-Hamlets")

containers=WebDriverWait(driver,10).until(EC.visibility_of_all_elements_located((By.XPATH,"//div[contains(@class,'home-name')]//p//a[@href]")))

for container in containers:
     print container.text

enter image description here

Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...