Не могу получить href от div, несмотря на вызов класса - PullRequest
0 голосов
/ 08 марта 2020

Я пытаюсь получить ссылки на все товары на этом сайте: https://www.officeworks.com.au/shop/officeworks/c/technology/audio-speakers/voice-assistant-speakers

Например, для Google Home Mini Chalk Я должен получить https://www.officeworks.com.au/shop/officeworks/p/google-home-mini-chalk-sygminiwe

Однако я даже не могу попасть в класс div, который предшествует ссылке href. Я пробовал разные коды, все с bs4. Вот два кода, которые, я уверен, собирались работать, но не работали:

Первый код :

from bs4 import BeautifulSoup
from urllib.request import Request, urlopen

url_products = []
url = "https://www.officeworks.com.au/shop/officeworks/c/technology/audio-speakers/voice-assistant-speakers"
req = Request(url)
html_page = urlopen(req)
soup = BeautifulSoup(html_page, "lxml")
data = soup.find_all('div', {'class': 'ProductTile__ProductImageWrapper-sc-1dlojg1-2 gRQAGx'})
for div in data:
    links = div.find_all('a')
    for a in links:
        print('https://www.officeworks.com.au/' + a['href'])
        url_products.append('https://www.officeworks.com.au/' + a['href'])

Второй код :

from bs4 import BeautifulSoup
import requests

r = requests.get('https://www.officeworks.com.au/shop/officeworks/c/technology/audio-speakers/voice-assistant-speakers')
soup = BeautifulSoup(r.content, 'lxml')
links = [item['href'] for item in soup.select('.gRQAGx > a')]

Полагаю, я не звоню в правильный класс, но мне не удается понять, что это такое. Заранее спасибо!

1 Ответ

1 голос
/ 08 марта 2020

Причина, по которой вы не получаете ожидаемый вывод, потому что страница загружается через JavaScript, поэтому вы не сможете извлечь ожидаемый вывод, пока не наберете render JS.

Таким образом, вы можете использовать Selenium, но я не рекомендую его, поскольку это замедлит вашу задачу.

Или использовать HTMLSession из requests_html для рендеринга на лету.

В противном случае давайте просто используем источник, где JS визуализируется из его API.

после отслеживания запроса XHR через Network-Tab в Browser Developer tools CTRL SHIFT E для FireFox et c.

Итак, здесь мы можем сделать вызов:

import requests

json = {"requests": [{"indexName": "prod-product-wc-bestmatch-personal", "params": "query=&hitsPerPage=24&maxValuesPerFacet=10&page=0&highlightPreTag=%3Cais-highlight-0000000000%3E&highlightPostTag=%3C%2Fais-highlight-0000000000%3E&clickAnalytics=true&optionalFilters=%5B%5D&sumOrFiltersScores=true&filters=(categorySeoPaths%3A%22technology%2Faudio-speakers%2Fvoice-assistant-speakers%22)&facets=%5B%22rangedOnline%22%2C%22forestProductSchemeName%22%2C%22hardDriveType%22%2C%22bagStyle%22%2C%22socketType%22%2C%22fullSizeInnerDimensions%22%2C%22stapleSize%22%2C%22connectivity%22%2C%22smartHomeCompatibility%22%2C%22industryType%22%2C%22sizeCapacity%22%2C%22performancePrintResolution%22%2C%22handsetIncludedHandsets%22%2C%22usbFlashLidType%22%2C%22videoResolution%22%2C%22maximumPunchingCapacity%22%2C%22rangedRetail%22%2C%22protectionType%22%2C%22rulerLength%22%2C%22sizeNumber%22%2C%22deviceConnectivityTechnology%22%2C%22unitsOfMeasure%22%2C%22selfAdhesive%22%2C%22interfaceHardDrive%22%2C%22sharpenerSize%22%2C%22connectivityWifiBands%22%2C%22microphoneType%22%2C%22labellerKeyboardLayout%22%2C%22numberOfUsb30Ports%22%2C%22operatingSystemEdition%22%2C%22ringRingSize%22%2C%22performanceHealthMonitoringFunctions%22%2C%22connectivityTechnology%22%2C%22dualSimCompatible%22%2C%22audioSource%22%2C%22totalNumberOfLabels%22%2C%22brushShape%22%2C%22maxProcessorClockSpeed%22%2C%22operatingHand%22%2C%22powerBatteryTechnology%22%2C%22travelRegion%22%2C%22capacityBinder%22%2C%22licenceValidityPeriod%22%2C%22storageHardDriveCapacity%22%2C%22spineSize%22%2C%22rollLength%22%2C%22numberOfRings%22%2C%22lightBulbType%22%2C%22colour%22%2C%222SidedCopying%22%2C%22automaticDocumentFeederCapacity%22%2C%22automaticPaperFeed%22%2C%22performanceShredderCutType%22%2C%22performanceBrightness%22%2C%22displayResolution%22%2C%22labellingOfficeUseFacet%22%2C%22securityLevel%22%2C%22maxSupportedDocumentSize%22%2C%22bulkbuyOnline%22%2C%22staplingCapacity%22%2C%22storageIncludedFlashMemory%22%2C%22compatibabilityCustomFitAndroid%22%2C%22drawerNumberOfDrawers%22%2C%22storageInternalMemorySize%22%2C%22ramInstalledSize%22%2C%22100RecycledProduct%22%2C%22placementPlacingMounting%22%2C%22earPlacement%22%2C%22foldedDimensions%22%2C%22portsTotalNumberOfNetworkingPorts%22%2C%22powerBatteryChargeAmpHours%22%2C%22noiseCancelling%22%2C%22surfaceShape%22%2C%22labellingHomeUseFacet%22%2C%22sizeDescription%22%2C%22maxLoadWeight%22%2C%22numberOfPowerPorts%22%2C%22compatibabilityCustomFitApple%22%2C%22tsaApproved%22%2C%22chassisType%22%2C%22surgeSuppression%22%2C%22printingTechnologyPrinters%22%2C%22placementVesaMountCompatibility%22%2C%22boardSizeFacet%22%2C%22frameStyle%22%2C%22serviceProvider%22%2C%22bluetoothCompatibility%22%2C%22scannerType%22%2C%22photoCapacityQuantity%22%2C%22numberOfUsb20Ports%22%2C%22rulingType%22%2C%22learningSkillsFocus%22%2C%22licenceType%22%2C%22connectivityDisplayConnections%22%2C%22performanceMaxThickness%22%2C%22performanceResolution%22%2C%22paperWeightGsm%22%2C%22numberOfProcessorCores%22%2C%22fitsDevice%22%2C%22brushhairtype%22%2C%22opticalZoom%22%2C%22processorClockSpeed%22%2C%22labellingIndustrialUseFacet%22%2C%22performanceApproximateNumberOfImpressions%22%2C%222SidedPrinting%22%2C%22powerPowerType%22%2C%22interfaceType%22%2C%22printerConnectivityTechnology%22%2C%22numberOfReamsPerCarton%22%2C%22baseWheels%22%2C%22performanceEstimatedCartridgeYieldSheets%22%2C%22papersize%22%2C%22processorType%22%2C%22wallStrengthThickness%22%2C%22storageHardDriveCapacityComputingDevices%22%2C%22ciewhiteness%22%2C%22runTime%22%2C%22stampInking%22%2C%22switched%22%2C%22processorManufacturer%22%2C%22deviceCaseCompatibility%22%2C%22caseFeaturesNumberOfCompartments%22%2C%22displaySize%22%2C%222sidedScanning%22%2C%22glutenFree%22%2C%22restTime%22%2C%22operatingPlatformCompatibility%22%2C%22powerSource%22%2C%22touchScreen%22%2C%22displayPanelType%22%2C%22secondaryProcessorType%22%2C%22wastebinCapacityRange%22%2C%22softwareDistributionMedia%22%2C%22learningAgeRange%22%2C%22tapeWidth%22%2C%22storageStorageCapacity%22%2C%22cableLength%22%2C%22skillLevel%22%2C%22flightTime%22%2C%22energyRating%22%2C%22maximumRecommendedDailyUsage%22%2C%22contentLayout%22%2C%22deviceLocation%22%2C%22brand%22%2C%22numberOfUsb31Ports%22%2C%22lidIncluded%22%2C%22scannerScanResolution%22%2C%22portsNumberOfUsbChargePorts%22%2C%22envelopeSize%22%2C%22keyboardCompatibility%22%2C%22primaryCameraVideo%22%2C%22supportedMemoryCards%22%2C%22connectivityDisplayConnectionsPanels%22%2C%22up1Category%22%2C%22price%22%2C%22categorySeoPaths%22%2C%22rangedRetail%22%2C%22rangedOnline%22%2C%22price%22%2C%22brand%22%2C%22colour%22%2C%22audioSource%22%2C%22cableLength%22%2C%22up1Category%22%2C%22bulkbuyOnline%22%2C%22microphoneType%22%2C%22noiseCancelling%22%2C%22bluetoothCompatibility%22%2C%22powerBatteryTechnology%22%2C%22smartHomeCompatibility%22%5D&tagFilters=&facetFilters=%5B%5B%22categorySeoPaths%3Atechnology%2Faudio-speakers%2Fvoice-assistant-speakers%22%5D%5D"}, {"indexName": "prod-product-wc-bestmatch-personal", "params": "query=&hitsPerPage=1&maxValuesPerFacet=10&page=0&highlightPreTag=%3Cais-highlight-0000000000%3E&highlightPostTag=%3C%2Fais-highlight-0000000000%3E&clickAnalytics=false&optionalFilters=%5B%5D&sumOrFiltersScores=true&filters=(categorySeoPaths%3A%22technology%2Faudio-speakers%2Fvoice-assistant-speakers%22)&attributesToRetrieve=%5B%5D&attributesToHighlight=%5B%5D&attributesToSnippet=%5B%5D&tagFilters=&analytics=false&facets=categorySeoPaths"}]}
r = requests.post("https://k535caawve-dsn.algolia.net/1/indexes/*/queries?x-algolia-agent=Algolia%20for%20JavaScript%20(3.35.1)%3B%20Browser%20(lite)%3B%20react-instantsearch%205.4.0%3B%20JS%20Helper%202.26.1&x-algolia-application-id=K535CAAWVE&x-algolia-api-key=8a831febe0110932cfa06ff0e2024b4f", json=json).json()

for item in r['results'][0]['hits']:
    print("Name: {:<65}, Url: {}".format(
        item['name'], f"https://www.officeworks.com.au/shop/officeworks/p/{item['urlKeyword']}"))

Вывод:

Name: Google Home Mini Chalk                                           , Url: https://www.officeworks.com.au/shop/officeworks/p/google-home-mini-chalk-sygminiwe
Name: Google Home Mini Charcoal                                        , Url: https://www.officeworks.com.au/shop/officeworks/p/google-home-mini-charcoal-sygminibk
Name: Google Nest Hub Max Charcoal                                     , Url: https://www.officeworks.com.au/shop/officeworks/p/google-nest-hub-max-charcoal-sygnhmaxbk
Name: Google Nest Hub Max Chalk                                        , Url: https://www.officeworks.com.au/shop/officeworks/p/google-nest-hub-max-chalk-sygnhmaxwe
Name: Google Home                                                      , Url: https://www.officeworks.com.au/shop/officeworks/p/google-home-sygghome
Name: Ultimate Ears Megablast Wireless Speaker with Alexa Graphite     , Url: https://www.officeworks.com.au/shop/officeworks/p/ultimate-ears-megablast-wireless-speaker-with-alexa-graphite-inmblastbk
Name: Google Nest Mini 2nd Generation Charcoal                         , Url: https://www.officeworks.com.au/shop/officeworks/p/google-nest-mini-2nd-generation-charcoal-sygnmini2c
Name: Google Nest Mini 2nd Generation Chalk                            , Url: https://www.officeworks.com.au/shop/officeworks/p/google-nest-mini-2nd-generation-chalk-sygnmini2w
Name: Ultimate Ears Blast Wireless Speaker with Alexa Graphite         , Url: https://www.officeworks.com.au/shop/officeworks/p/ultimate-ears-blast-wireless-speaker-with-alexa-graphite-imblastbk
Name: Amazon 5.5" Echo Show 5 Charcoal                                 , Url: https://www.officeworks.com.au/shop/officeworks/p/amazon-5-5-echo-show-5-charcoal-syecosh5cl
Name: Amazon Echo 3rd Generation Charcoal                              , Url: https://www.officeworks.com.au/shop/officeworks/p/amazon-echo-3rd-generation-charcoal-syaedotclc
Name: JBL Flip Essential Bluetooth Speaker Gun Metal                   , Url: https://www.officeworks.com.au/shop/officeworks/p/jbl-flip-essential-bluetooth-speaker-gun-metal-imjblfless
Name: Ultimate Ears Megablast Wireless Speaker with Alexa Blue         , Url: https://www.officeworks.com.au/shop/officeworks/p/ultimate-ears-megablast-wireless-speaker-with-alexa-blue-inmblastbe
Name: Amazon Echo Dot 3rd Gen With Clock Sandstone                     , Url: https://www.officeworks.com.au/shop/officeworks/p/amazon-echo-dot-3rd-gen-with-clock-sandstone-syaedotcls
Name: Ultimate Ears Megablast Wireless Speaker with Alexa Merlot       , Url: https://www.officeworks.com.au/shop/officeworks/p/ultimate-ears-megablast-wireless-speaker-with-alexa-merlot-inmblastrd
Name: Amazon Echo Dot 3rd Gen Heather Grey                             , Url: https://www.officeworks.com.au/shop/officeworks/p/amazon-echo-dot-3rd-gen-heather-grey-syamdot3ng
Name: Lenovo Smart Clock E27 Starter Pack                              , Url: https://www.officeworks.com.au/shop/officeworks/p/lenovo-smart-clock-e27-starter-pack-sylsmcbun2
Name: Amazon 5.5" Echo Show 5 Sandstone                                , Url: https://www.officeworks.com.au/shop/officeworks/p/amazon-5-5-echo-show-5-sandstone-syecosh5ss
Name: Amazon Echo Studio Black                                         , Url: https://www.officeworks.com.au/shop/officeworks/p/amazon-echo-studio-black-syastudiob
Name: Lenovo Smart Clock B22 Starter Pack                              , Url: https://www.officeworks.com.au/shop/officeworks/p/lenovo-smart-clock-b22-starter-pack-sylsmcbun1
Name: JBL Link View Speaker with Google Assistant                      , Url: https://www.officeworks.com.au/shop/officeworks/p/jbl-link-view-speaker-with-google-assistant-injblinkvw
Name: Ultimate Ears Blast Wireless Speaker with Alexa Blue Steel       , Url: https://www.officeworks.com.au/shop/officeworks/p/ultimate-ears-blast-wireless-speaker-with-alexa-blue-steel-imblastbe
Name: LG WK7 ThinQ WiFi/Bluetooth Speaker with Google Assistant        , Url: https://www.officeworks.com.au/shop/officeworks/p/lg-wk7-thinq-wifi-bluetooth-speaker-with-google-assistant-inlgthinkq
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...