Я хочу удалить данные в записной книжке IBM Watson Studio Jupyter со страницы результатов поиска:
https://www.aspc.co.uk/search/?PrimaryPropertyType=Rent&SortBy=PublishedDesc&LastUpdated=AddedAnytime&SearchTerm=&PropertyType=Residential&PriceMin=&PriceMax=&Bathrooms=&OrMoreBathrooms=true&Bedrooms=&OrMoreBedrooms=true&HasCentralHeating=false&HasGarage=false&HasDoubleGarage=false&HasGarden=false&IsNewBuild=false&IsDevelopment=false&IsParkingAvailable=false&IsPartExchangeConsidered=false&PublicRooms=&OrMorePublicRooms=true&IsHmoLicense=false&IsAllowPets=false&IsAllowSmoking=false&IsFullyFurnished=false&IsPartFurnished=false&IsUnfurnished=false&ExcludeUnderOffer=false&IncludeClosedProperties=true&ClosedDatesSearch=14&MapSearchType=EDITED&ResultView=LIST&ResultMode=NONE&AreaZoom=13&AreaCenter[lat]=57.14955426557916&AreaCenter[lng]=-2.0927401123046785&EditedZoom=13&EditedCenter[lat]=57.14955426557916&EditedCenter[lng]=-2.0927401123046785
Я пробовал BeautifulSoup и попробовал Selenium (полное раскрытие: Я новичок) по нескольким вариантам кодов. Я рассмотрел десятки вопросов о переполнении стека, статьях на Medium и т. Д. c и не могу понять, что делаю неправильно.
Последний вопрос, который я делаю:
from bs4 import BeautifulSoup html_soup = BeautifulSoup(response.text, 'html.parser') type(html_soup) properties_containers = html_soup.find_all('div', class_ = 'information-card property-card col ') print(type(properties_containers)) print(len(properties_containers))
Это возвращает 0.
<class 'bs4.element.ResultSet'> 0
Может ли кто-нибудь направить меня в правильном направлении относительно того, что я делаю неправильно / упускаю?
Данные, которые вы видите, загружаются через JavaScript. BeautifulSoup не может его выполнить, но вы можете использовать модуль requests для загрузки данных из их API.
requests
Например:
import json import requests url = 'https://www.aspc.co.uk/search/?PrimaryPropertyType=Rent&SortBy=PublishedDesc&LastUpdated=AddedAnytime&SearchTerm=&PropertyType=Residential&PriceMin=&PriceMax=&Bathrooms=&OrMoreBathrooms=true&Bedrooms=&OrMoreBedrooms=true&HasCentralHeating=false&HasGarage=false&HasDoubleGarage=false&HasGarden=false&IsNewBuild=false&IsDevelopment=false&IsParkingAvailable=false&IsPartExchangeConsidered=false&PublicRooms=&OrMorePublicRooms=true&IsHmoLicense=false&IsAllowPets=false&IsAllowSmoking=false&IsFullyFurnished=false&IsPartFurnished=false&IsUnfurnished=false&ExcludeUnderOffer=false&IncludeClosedProperties=true&ClosedDatesSearch=14&MapSearchType=EDITED&ResultView=LIST&ResultMode=NONE&AreaZoom=13&AreaCenter[lat]=57.14955426557916&AreaCenter[lng]=-2.0927401123046785&EditedZoom=13&EditedCenter[lat]=57.14955426557916&EditedCenter[lng]=-2.0927401123046785' api_url = 'https://api.aspc.co.uk/Property/GetProperties?{}&Sort=PublishedDesc&Page=1&PageSize=12' params = url.split('?')[-1] data = requests.get(api_url.format(params)).json() # uncomment this to print all data: # print(json.dumps(data, indent=4)) # <-- uncomment this to see all data received from server # print some data to screen: for property_ in data: print(property_['Location']['AddressLine1']) print(property_['CategorisationDescription']) print('Bedrooms:', property_["Bedrooms"]) # <-- print number of Bedrooms print('Bathrooms:', property_["Bathrooms"]) # <-- print number of Bathrooms print('PublicRooms:', property_["PublicRooms"]) # <-- print number of PublicRooms # .. etc. print('-' * 80)
Печать:
44 Roslin Place Fully furnished 2 Bdrm 1st flr Flat. Hall. Lounge. Dining kitch. 2 Bdrms. Bathrm (CT band - C). Deposit 1 months rent. Parking. No pets. No smokers. Rent £550 p.m Entry by arr. Viewing contact solicitors. Landlord reg: 871287/100/26061. (EPC band - B). Bedrooms: 2 Bathrooms: 1 PublicRooms: 1 -------------------------------------------------------------------------------- Second Floor Left, 173 Victoria Road Unfurnished 1 Bdrm 2nd flr Flat. Hall. Lounge. Dining kitch. Bdrm. Bathrm (CT Band - A). Deposit 1 months rent. No pets. No smokers. Rent £375 p.m Immed entry. Viewing contact solicitors. Landlord reg: 1261711/100/09072. (EPC band - D). Bedrooms: 1 Bathrooms: 1 PublicRooms: 1 -------------------------------------------------------------------------------- 102 Bedford Road Fully furnished 3 Bdrm 1st flr Flat. Hall. Lounge. Kitch. 3 Bdrms. Bathrm (CT band - B). Deposit 1 months rent. Garden. HMO License. No pets. No smokers. Rent £750 p.m Entry by arr. Viewing contact solicitors. Landlord reg: 49171/100/27130. (EPC band - D). Bedrooms: 3 Bathrooms: 1 PublicRooms: 1 -------------------------------------------------------------------------------- ... and so on.