Я хочу очистить URL-адреса списков из результатов поиска на tripadvisor.
То, что у меня есть:
import requests
from bs4 import BeautifulSoup
from csv import writer
base_url = 'https://www.tripadvisor.com/' ## we need this to join the links later ##
main_page = 'https://www.tripadvisor.co.za/Search?q=south%20africa&searchSessionId=2F3220244606B7340E89ADA2757A3F351594636013014ssid&sid=299A954FAE1421A8AFF98152D29A13DC1594642741435&blockRedirect=true&ssrc=e&geo=1'
links = []
## get the initial page to find the number of pages ##
r = requests.get(main_page.format(0))
soup = BeautifulSoup(r.text, "html.parser")
## select the last page from the list of pages ('a', {'class':'pageNum taLnk'}) ##
last_page = 8
## now iterate over that range (first page, last page, number of links), and extract the links from each page ##
for i in range(0, last_page + 30, 30):
page = main_page.format(i)
soup = BeautifulSoup(requests.get(page).text, "html.parser") ## get the next page and parse it with BeautifulSoup ##
## get the hrefs from ('div', {'class':'listing_title'}), and join them with base_url to make the links ##
links += [ base_url + link.find('a').get('href') for link in soup.find_all('div', {'class':'result-card'}) ]
for link in links :
print(link)
Это не работает, и я думаю, что меня не хватает кое-что важное. Есть идеи?