webscrape возвращает пустую строку после посещения каждой ссылки href - PullRequest
0 голосов
/ 24 апреля 2019

У меня есть приведенный ниже код, дающий мне правильные ссылки href на страницы сведений о продукте, однако мой результат очистки показывает пустой список. Я хочу получить описание продукта под кнопкой «добавить в корзину».Что мне здесь не хватает?

Выход:

https://www.nike.com/t/nikecourt-air-zoom-vapor-x-mens-hard-court-tennis-shoe-6J0fk8/AA8030-103 [] https://www.nike.com/t/nikecourt-zoom-cage-3-mens-hard-court-tennis-shoe-mbXWvX []

import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
import time
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from random import randint
from time import sleep

final = []
with requests.Session() as s:
    driver = webdriver.Chrome('/Users/Selenium/bin/chromedriver')
    ###########THIS IS THE URL 
    driver.get('https://store.nike.com/us/en_us/pw/mens-tennis-shoes/7puZ8r0Zoi3')
    products = [element for element in WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='grid-item fullSize']")))]
    driver.execute_script('el = document.elementFromPoint(47, 457); el.click();')
    soup = bs(driver.page_source, 'lxml')
    items  = soup.select('.grid-item-content')
    titles  = [item.find("p", {"class" : lambda L: L and L.startswith('product-display-name')}).text.strip() for item in items]
    links = [item.find('a')['href'] for item in items]
    results = list(zip(titles, links))
    df = pd.DataFrame(results)
    for result in results:
        res = s.get(result[1])
        soup = bs(res.content, 'lxml')
        print(result[1])
        details = [item for item in soup.select('.description-preview fs16-sm css-1pbvugb')]
        print(details)
driver.quit()

Ответы [ 2 ]

1 голос
/ 24 апреля 2019

Я пытался посмотреть, смогу ли я сразу перейти к API и взять его, но, похоже, не смог его найти. Однако он доступен в тегах <script> в формате json. Просто нужно найти его, а затем повторить, чтобы получить то, что вы хотите. Там также цена, отзывы клиентов и все виды данных:

import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
import time
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from random import randint
from time import sleep
import json

final = []
with requests.Session() as s:
    s.headers.update({'Accept-Language': 'en-US'})
    driver = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
    ###########THIS IS THE URL 
    driver.get('https://store.nike.com/us/en_us/pw/mens-tennis-shoes/7puZ8r0Zoi3')
    products = [element for element in WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='grid-item fullSize']")))]
    driver.execute_script('el = document.elementFromPoint(47, 457); el.click();')
    soup = bs(driver.page_source, 'lxml')
    items  = soup.select('.grid-item-content')
    titles  = [item.find("p", {"class" : lambda L: L and L.startswith('product-display-name')}).text.strip() for item in items]
    links = [item.find('a')['href'] for item in items]
    results = list(zip(titles, links))
    df = pd.DataFrame(results)
    for result in results:

        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36',
                'Accept-Language': 'en-US'}

        res = s.get(result[1], headers=headers )
        soup = bs(res.text, 'lxml')
        print(result[1])

        scripts = soup.find_all('script')
        for script in scripts:
            if 'window.INITIAL_REDUX_STATE=' in script.text:
                jsonStr = script.text.split('window.INITIAL_REDUX_STATE=')[1]
                jsonStr = jsonStr.rsplit(';',1)[0]
                jsonData = json.loads(jsonStr)

                for k, v in jsonData['Threads']['products'].items():
                    details = bs(v['description'], 'lxml').text
        print(details,'\n')
driver.quit()

Выход:

https://www.nike.com/t/nikecourt-air-zoom-vapor-x-mens-hard-court-tennis-shoe-6J0fk8/AA8030-103
ULTRALIGHT SPEED.With Nike Zoom Air and a Dynamic Fit system, the NikeCourt Air Zoom Vapor X provides ultimate control on hard courts.Secure FitThe Dynamic Fit system wraps your foot from the bottom of the arch up to the laces for a glove-like fit.Responsive CushioningA Zoom Air unit in the heel offers low-profile, resilient cushioning from swing to swing.Quick StabilityThe full-length TPU foot frame wraps up the outside of your foot for added stability on every turn and swing.More BenefitsPadded collar provides additional comfort.Built up rubber on the toe increases durability and protection from drag.Non-marking rubber outsole for durable traction on hard courts.Shown: Black/Bright Crimson/WhiteStyle: AA8030-016 

https://www.nike.com/t/nikecourt-zoom-cage-3-mens-hard-court-tennis-shoe-mbXWvX
STRENGTH AND SPEED.The NikeCourt Zoom Cage 3 is made for the player seeking strength and speed on the hard court. The shoe’s unique cage design provides maximum durability and cushioning, and is also lighter than ever.Maximum DurabilityMade with a lightweight CPU cage built up in the high wear zone areas specific to tennis. “Zoned” cage adds stability without adding weight.Exceptional TractionThe modified herringbone outsole delivers excellent traction and durability. Ideal for hard court surfaces.
Complete ComfortNike Zoom Air unit in the heel delivers responsive, lightweight cushioning.More BenefitsExternal heel clip is efficiently shaped to secure the heel.Flexible support in the midfoot provides lightweight stability.Full bootie construction wraps your foot for a snug fit.Kurim material on upper allows for elasticity and flexibility.Shown: White/Light Carbon/Light Blue Fury/ObsidianStyle: 918193-104 

https://www.nike.com/t/nikecourt-air-zoom-zero-mens-tennis-shoe-nHMRHN
COURT FEEL, OPTIMIZED.Featuring the first full-length Zoom Air unit in NikeCourt history, the NikeCourt Air Zoom Zero delivers exceptional responsiveness and great court feel. Its snug-fitting upper and webbed lacing system offer second-skin-like comfort and lockdown.BenefitsFull-length Zoom Air unit is curved to deliver responsive cushioning.Integrated crash pad helps promote a smooth heel-to-toe transition.1/2 sleeve provides a snug, sock-like fit.Gilly straps on the medial and lateral side integrate with the laces for a customizable fit.Midsole foam on top of the front Zoom Air unit brings the unit closer to the ground.Midsole foam underneath the back of the Zoom Air unit brings the unit closer to your heel.Outsole is cored out in the middle to reduce weight and show off the Zoom Air unit.Outsole material wraps over the toe on the medial side for added durability while sliding.Shown: Vast Grey/Indigo ForceStyle: AA8018-044 
...
1 голос
/ 24 апреля 2019

Похоже, что JS рендерит на страницу. Вы можете снова взять driver.page_source внутри итерации.

import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
import time
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from random import randint
from time import sleep
#'/Users/Selenium/bin/chromedriver'
final = []

with requests.Session() as s:
    driver = webdriver.Chrome('/Users/Selenium/bin/chromedriver')
    ###########THIS IS THE URL
    driver.get('https://store.nike.com/us/en_us/pw/mens-tennis-shoes/7puZ8r0Zoi3')
    products = [element for element in WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='grid-item fullSize']")))]
    driver.execute_script('el = document.elementFromPoint(47, 457); el.click();')
    soup = bs(driver.page_source, 'lxml')
    items  = soup.select('.grid-item-content')
    titles  = [item.find("p", {"class" : lambda L: L and L.startswith('product-display-name')}).text.strip() for item in items]
    links = [item.find('a')['href'] for item in items]
    results = list(zip(titles, links))
    df = pd.DataFrame(results)
    for result in results:
        driver.get(result[1])
        soup = bs(driver.page_source, 'lxml')
        print(result[1])
        details = [item.text for item in soup.select('.description-preview.fs16-sm.css-1pbvugb')]
        print(details)
driver.quit()

Выход:

https://www.nike.com/t/nikecourt-air-zoom-vapor-x-mens-hard-court-tennis-shoe-6J0fk8/AA8030-103
['With Nike Zoom Air and a Dynamic Fit system, the NikeCourt Air Zoom Vapor X provides ultimate control on hard courts.Shown: White/BlackStyle: AA8030-103']
https://www.nike.com/t/nikecourt-zoom-cage-3-mens-hard-court-tennis-shoe-mbXWvX
['The NikeCourt Zoom Cage 3 is made for the player seeking strength and speed on the hard court. The shoe’s unique cage design provides maximum durability and cushioning, and is also lighter than ever.Shown: Black/WhiteStyle: 918193-006']
https://www.nike.com/t/nikecourt-air-zoom-zero-mens-tennis-shoe-nHMRHN
['Featuring the first full-length Zoom Air unit in NikeCourt history, the NikeCourt Air Zoom Zero delivers exceptional responsiveness and great court feel. Its snug-fitting upper and webbed lacing system offer second-skin-like comfort and lockdown.Shown: Black/Black/WhiteStyle: AA8018-003']
https://www.nike.com/t/nikecourt-air-max-wildcard-mens-tennis-shoe-p9NhX7
['The NikeCourt Air Max Wildcard delivers the comfort you need to hit hard and move fast on the court. A Max Air unit under your heel cushions every step, while an innovative Lunarlon midsole provides a springy underfoot sensation and extra stability.Shown: Black/Phantom/Bright Crimson/PhantomStyle: AO7351-006']
https://www.nike.com/t/nikecourt-zoom-cage-3-mens-hard-court-tennis-shoe-l3qpKZ/918193-005
['The NikeCourt Zoom Cage 3 is made for the player seeking strength and speed on the hard court. The shoe’s unique cage design provides maximum durability and cushioning, and is also lighter than ever.Shown: Platinum Tint/Laser Fuchsia/Thunder GreyStyle: 918193-005']
https://www.nike.com/t/nikecourt-air-zoom-resistance-mens-hard-court-tennis-shoe-qmZW1o/918194-003
['The\xa0NikeCourt Air Zoom Resistance delivers lightweight durability on the hard court with a performance leather upper.Shown: Black/Bright Crimson/WhiteStyle: 918194-003']
https://www.nike.com/t/nikecourt-air-zoom-prestige-mens-hard-court-tennis-shoe-vY8981
['The NikeCourt Air Zoom Prestige combines the responsiveness of Zoom Air technology with the lockdown of Dynamic Fit for glove-like comfort and support on hard courts.Shown: Vast Grey/Indigo Force/Indigo ForceStyle: AA8020-054']
https://www.nike.com/t/nikecourt-lite-mens-hard-court-tennis-shoe-7qqvCd
['The NikeCourt Lite is built for total comfort with a premium upper and a durable outsole designed for hard\xa0courts.Shown: White/Medium Grey/BlackStyle: 845021-100']
https://www.nike.com/t/nikecourt-lite-mens-hard-court-tennis-shoe-VrTWWAE1/845021-054
['The NikeCourt Lite is built for total comfort with a premium upper and a durable outsole designed for hard\xa0courts.Shown: Vast Grey/Indigo ForceStyle: 845021-054']
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...