Как правильно определить контейнеры при просмотре веб-страниц на сайте покупок (Chelsea F C Megastore) - PullRequest
0 голосов
/ 21 марта 2020

enter image description here Я пытаюсь создать веб-сценарий очистки для определенной страницы на Chelsea F C Сайт мегамаркета, где продаются товары Chelsea. URL, который я пытаюсь разобрать, таков.

Я использую следующий скрипт:

#! python3

# import the libraries for web scrapping

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

# define the url

my_url = 'https://www.chelseamegastore.com/stores/chelsea/en/c/kits/away-kit'

# opening up connection and grabbing a page
uClient = uReq(my_url)
page_html = uClient.read()

# closing the connection
uClient.close()

# parse the html
page_soup = soup(page_html, "html.parser")

# grabs each product
containers = page_soup.findAll("div", {"class": "browseProduct productViewItem ng-scope threeItems"})

Я не могу разобрать информацию о продукте в ' переменная контейнеров с этого момента. Это другие варианты, которые я пробовал, исходя из моего ограниченного понимания дизайна веб-страницы.

containers = page_soup.findAll("div",{"class_": "ng-isolate-scope"})

containers = page_soup.findAll("div",{"data-product": "product"})

containers = page_soup.findAll("div",{"ng-class": "productViewStyle"})

containers = page_soup.findAll("div",{"data-ng-repeat": "product in productData"})

containers = page_soup.findAll("div",{"class_": "browseProduct productViewItem ng-scope threeItems"})

Ожидаемый результат: Переменная списка, содержащая информацию обо всех продуктах на странице. Информация для каждого продукта будет следующей:

  1. Название продукта
  2. URL изображения продукта
  3. Цена продукта

1 Ответ

1 голос
/ 22 марта 2020

Веб-сайт использует код JavaScript для динамической визуализации своих данных после загрузки страницы. Таким образом, вы можете использовать Selenium или requests-html et c.

. Но так как мы можем определить, откуда данные отображаются с помощью XHR запроса к Back-End API, поэтому мы можем вызвать его напрямую через следующий код:

import requests
import json

data = "{\"searchTerm\":\"\",\"startIndex\":0,\"itemsToReturn\":\"60\",\"categories\":[\"kits\",\"away-kit\"],\"multiSelectFilters\":[],\"priceFilter\":{},\"generateSCodeScript\":true,\"showMoreFilters\":[]}"

r = requests.post(
    "https://www.chelseamegastore.com/stores/chelsea/en/Product/DoSearch", json=data).json()


data = json.dumps(r, indent=4)

print(data)

Вывод:

{
    "Products": [
        {
            "Name": "Chelsea Third Stadium Shirt 2019-20 with Gilmour 47 printing",
            "FullName": "Chelsea Third Stadium Shirt 2019-20 with Gilmour 47 printing",
            "Price": "$76.30",
            "Id": 1080300,
            "WasPrice": "$86.11",
            "ShowWasPrice": true,
            "ImageUrl": "//productview1.fanobject.com/0108/0300/01080300_00.jpg?imwidth=250",
            "Url": "/stores/chelsea/en/product/chelsea-third-stadium-shirt-2019-20-with-gilmour-47-printing/1080300",
            "HighlightedBackgroundHexColour": null,
            "HighlightedForegroundHexColour": null,
            "HighlightedText": null,
            "Categories": {
                "HasCategoryNames": false
            },
            "Sizes": null,
            "MembersPrice": "",
            "CustomerReviewSummary": {
                "AverageRating": 0.0,
                "NumberOfReviews": 0
            }
        },
        {
            "Name": "Chelsea Third Cup Stadium Shirt 2019-20 with Gilmour 47 printing",
            "FullName": "Chelsea Third Cup Stadium Shirt 2019-20 with Gilmour 47 printing",
            "Price": "$76.30",
            "Id": 1080321,
            "WasPrice": "$86.11",
            "ShowWasPrice": true,
            "ImageUrl": "//productview2.fanobject.com/0108/0321/01080321_00.jpg?imwidth=250",
            "Url": "/stores/chelsea/en/product/chelsea-third-cup-stadium-shirt-2019-20-with-gilmour-47-printing/1080321",
            "HighlightedBackgroundHexColour": null,
            "HighlightedForegroundHexColour": null,
            "HighlightedText": null,
            "Categories": {
                "HasCategoryNames": false
            },
            "Sizes": null,
            "MembersPrice": "",
            "CustomerReviewSummary": {
                "AverageRating": 0.0,
                "NumberOfReviews": 0
            }
        },
        {
            "Name": "Chelsea Third Cup Stadium Shirt 2019-20 with Marcos A. 3 printing",
            "FullName": "Chelsea Third Cup Stadium Shirt 2019-20 with Marcos A. 3 printing",
            "Price": "$76.30",
            "Id": 1078711,
            "WasPrice": "$115.50",
            "ShowWasPrice": true,
            "ImageUrl": "//productview2.fanobject.com/0107/8711/01078711_00.jpg?imwidth=250",
            "Url": "/stores/chelsea/en/product/chelsea-third-cup-stadium-shirt-2019-20-with-marcos-a.-3-printing/1078711",
            "HighlightedBackgroundHexColour": null,
            "HighlightedForegroundHexColour": null,
            "HighlightedText": null,
            "Categories": {
                "HasCategoryNames": false
            },
            "Sizes": null,
            "MembersPrice": "",
            "CustomerReviewSummary": {
                "AverageRating": 0.0,
                "NumberOfReviews": 0
            }
        },
        {
            "Name": "Chelsea Third Cup Stadium Shirt 2019-20 with England 9 printing",
            "FullName": "Chelsea Third Cup Stadium Shirt 2019-20 with England 9 printing",
            "Price": "$76.30",
            "Id": 1078857,
            "WasPrice": "$115.50",
            "ShowWasPrice": true,
            "ImageUrl": "//productview1.fanobject.com/0107/8857/01078857_00.jpg?imwidth=250",
            "Url": "/stores/chelsea/en/product/chelsea-third-cup-stadium-shirt-2019-20-with-england-9-printing/1078857",
            "HighlightedBackgroundHexColour": null,
            "HighlightedForegroundHexColour": null,
            "HighlightedText": null,
            "Categories": {
                "HasCategoryNames": false
            },
            "Sizes": null,
            "MembersPrice": "",
            "CustomerReviewSummary": {
                "AverageRating": 0.0,
                "NumberOfReviews": 0
            }
        },
        {
            "Name": "Chelsea Third Stadium Shirt 2019-20",
            "FullName": "Chelsea Third Stadium Shirt 2019-20",
            "Price": "$58.80",
            "Id": 264497,
            "WasPrice": "$98.00",
            "ShowWasPrice": true,
            "ImageUrl": "//productview2.fanobject.com/0026/4497/00264497_00.jpg?imwidth=250",
            "Url": "/stores/chelsea/en/product/chelsea-third-stadium-shirt-2019-20/264497",
            "HighlightedBackgroundHexColour": "",
            "HighlightedForegroundHexColour": "",
            "HighlightedText": null,
            "Categories": {
                "HasCategoryNames": false
            },
            "Sizes": null,
            "MembersPrice": "",
            "CustomerReviewSummary": {
                "AverageRating": 0.0,
                "NumberOfReviews": 0
            }
        },
        {
            "Name": "Chelsea Third Cup Stadium Shirt 2019-20 with Cuthbert 22 printing",
            "FullName": "Chelsea Third Cup Stadium Shirt 2019-20 with Cuthbert 22 printing",
            "Price": "$76.30",
            "Id": 1078866,
            "WasPrice": "$115.50",
            "ShowWasPrice": true,
            "ImageUrl": "//productview1.fanobject.com/0107/8866/01078866_00.jpg?imwidth=250",
            "Url": "/stores/chelsea/en/product/chelsea-third-cup-stadium-shirt-2019-20-with-cuthbert-22-printing/1078866",
            "HighlightedBackgroundHexColour": null,
            "HighlightedForegroundHexColour": null,
            "HighlightedText": null,
            "Categories": {
                "HasCategoryNames": false
            },
            "Sizes": null,
            "MembersPrice": "",
            "CustomerReviewSummary": {
                "AverageRating": 0.0,
                "NumberOfReviews": 0
            }
        },
        {
            "Name": "Chelsea Home Stadium Kit 2019-20 - Infants with Giroud 18 printing",
            "FullName": "Chelsea Home Stadium Kit 2019-20 - Infants with Giroud 18 printing",
            "Price": "$46.75",
            "Id": 1069209,
            "WasPrice": "$89.95",
            "ShowWasPrice": true,
            "ImageUrl": "//productview2.fanobject.com/0106/9209/01069209_00.jpg?imwidth=250",
            "Url": "/stores/chelsea/en/product/chelsea-home-stadium-kit-2019-20---infants-with-giroud-18-printing/1069209",
            "HighlightedBackgroundHexColour": null,
            "HighlightedForegroundHexColour": null,
            "HighlightedText": null,
            "Categories": {
                "HasCategoryNames": false
            },
            "Sizes": null,
            "MembersPrice": "",
            "CustomerReviewSummary": {
                "AverageRating": 0.0,
                "NumberOfReviews": 0
            }
        },
        {
            "Name": "Chelsea Third Cup Stadium Shirt 2019-20 with Zouma  15 printing",
            "FullName": "Chelsea Third Cup Stadium Shirt 2019-20 with Zouma  15 printing",
            "Price": "$76.30",
            "Id": 1078722,
            "WasPrice": "$115.50",
            "ShowWasPrice": true,
            "ImageUrl": "//productview2.fanobject.com/0107/8722/01078722_00.jpg?imwidth=250",
            "Url": "/stores/chelsea/en/product/chelsea-third-cup-stadium-shirt-2019-20-with-zouma--15-printing/1078722",
            "HighlightedBackgroundHexColour": null,
            "HighlightedForegroundHexColour": null,
            "HighlightedText": null,
            "Categories": {
                "HasCategoryNames": false
            },
            "Sizes": null,
            "MembersPrice": "",
            "CustomerReviewSummary": {
                "AverageRating": 0.0,
                "NumberOfReviews": 0
            }
        },
        {
            "Name": "Chelsea Third Cup Stadium Shirt 2019-20 with Barkley 8 printing",
            "FullName": "Chelsea Third Cup Stadium Shirt 2019-20 with Barkley 8 printing",
            "Price": "$76.30",
            "Id": 1078715,
            "WasPrice": "$115.50",
            "ShowWasPrice": true,
            "ImageUrl": "//productview2.fanobject.com/0107/8715/01078715_00.jpg?imwidth=250",
            "Url": "/stores/chelsea/en/product/chelsea-third-cup-stadium-shirt-2019-20-with-barkley-8-printing/1078715",
            "HighlightedBackgroundHexColour": null,
            "HighlightedForegroundHexColour": null,
            "HighlightedText": null,
            "Categories": {
                "HasCategoryNames": false
            },
            "Sizes": null,
            "MembersPrice": "",
            "CustomerReviewSummary": {
                "AverageRating": 0.0,
                "NumberOfReviews": 0
            }
        },
        {
            "Name": "Chelsea Third Cup Stadium Shirt 2019-20 with Bright 4 printing",
            "FullName": "Chelsea Third Cup Stadium Shirt 2019-20 with Bright 4 printing",
            "Price": "$76.30",
            "Id": 1078852,
            "WasPrice": "$115.50",
            "ShowWasPrice": true,
            "ImageUrl": "//productview1.fanobject.com/0107/8852/01078852_00.jpg?imwidth=250",
            "Url": "/stores/chelsea/en/product/chelsea-third-cup-stadium-shirt-2019-20-with-bright-4-printing/1078852",
            "HighlightedBackgroundHexColour": null,
            "HighlightedForegroundHexColour": null,
            "HighlightedText": null,
            "Categories": {
                "HasCategoryNames": false
            },
            "Sizes": null,
            "MembersPrice": "",
            "CustomerReviewSummary": {
                "AverageRating": 0.0,
                "NumberOfReviews": 0
            }
        }
    ],
    "MultipleSelectFilters": [
        {
            "Description": "Size",
            "IsVisible": true,
            "Type": "MultipleSelectSearchFilter",
            "Id": "sizeclothing_chelsea",
            "Options": [
                {
                    "Description": "12-13 Year",
                    "ProductCount": 6,
                    "Id": "sizeclothing_122d1320year",
                    "IsSelected": false
                },
                {
                    "Description": "7-8",
                    "ProductCount": 2,
                    "Id": "sizeclothing_72d8",
                    "IsSelected": false
                },
                {
                    "Description": "13-14",
                    "ProductCount": 1,
                    "Id": "sizeclothing_132d14",
                    "IsSelected": false
                },
                {
                    "Description": "13-15",
                    "ProductCount": 1,
                    "Id": "sizeclothing_132d15",
                    "IsSelected": false
                },
                {
                    "Description": "3-4",
                    "ProductCount": 1,
                    "Id": "sizeclothing_32d4",
                    "IsSelected": false
                },
                {
                    "Description": "9-10",
                    "ProductCount": 1,
                    "Id": "sizeclothing_92d10",
                    "IsSelected": false
                },
                {
                    "Description": "One Size O",
                    "ProductCount": 1,
                    "Id": "sizeclothing_one20size20o",
                    "IsSelected": false
                },
                {
                    "Description": "XS",
                    "ProductCount": 44,
                    "Id": "sizeclothing_xs",
                    "IsSelected": false
                },
                {
                    "Description": "S",
                    "ProductCount": 194,
                    "Id": "sizeclothing_s",
                    "IsSelected": false
                },
                {
                    "Description": "M",
                    "ProductCount": 141,
                    "Id": "sizeclothing_m",
                    "IsSelected": false
                }
            ],
            "LinkType": 1
        },
        {
            "Description": "Colour",
            "IsVisible": true,
            "Type": "ColourSearchFilter",
            "Id": "chelsea1",
            "Options": [
                {
                    "FriendlyName": "Blue",
                    "Description": "Blue",
                    "ProductCount": 110,
                    "Id": "r16",
                    "IsSelected": false
                },
                {
                    "FriendlyName": "Grey",
                    "Description": "Grey",
                    "ProductCount": 70,
                    "Id": "r169",
                    "IsSelected": false
                },
                {
                    "FriendlyName": "Navy",
                    "Description": "Navy",
                    "ProductCount": 59,
                    "Id": "r2389",
                    "IsSelected": false
                },
                {
                    "FriendlyName": "White",
                    "Description": "White",
                    "ProductCount": 19,
                    "Id": "r7",
                    "IsSelected": false
                },
                {
                    "FriendlyName": "Black",
                    "Description": "Black",
                    "ProductCount": 17,
                    "Id": "r2",
                    "IsSelected": false
                },
                {
                    "FriendlyName": "Yellow",
                    "Description": "Yellow",
                    "ProductCount": 13,
                    "Id": "r265",
                    "IsSelected": false
                },
                {
                    "FriendlyName": "Red",
                    "Description": "Red",
                    "ProductCount": 10,
                    "Id": "r119",
                    "IsSelected": false
                },
                {
                    "FriendlyName": "Pink",
                    "Description": "Pink",
                    "ProductCount": 10,
                    "Id": "r167",
                    "IsSelected": false
                },
                {
                    "FriendlyName": "Silver",
                    "Description": "Silver",
                    "ProductCount": 5,
                    "Id": "r740",
                    "IsSelected": false
                },
                {
                    "FriendlyName": "Gold",
                    "Description": "Gold",
                    "ProductCount": 3,
                    "Id": "r91",
                    "IsSelected": false
                },
                {
                    "FriendlyName": "Green",
                    "Description": "Green",
                    "ProductCount": 1,
                    "Id": "r118",
                    "IsSelected": false
                },
                {
                    "FriendlyName": "Multi-coloured",
                    "Description": "Multi-coloured",
                    "ProductCount": 1,
                    "Id": "r641",
                    "IsSelected": false
                }
            ],
            "LinkType": 0
        }
    ],
    "CategoryFilter": {
        "Id": "categories",
        "Description": "Shop By",
        "IsVisible": true,
        "Options": [
            {
                "Url": "/stores/chelsea/en/c/kits",
                "ProductCount": 1418,
                "UrlSafeName": "kits",
                "Description": "Kits"
            },
            {
                "Url": "/stores/chelsea/en/c/training",
                "ProductCount": 87,
                "UrlSafeName": "training",
                "Description": "Training"
            },
            {
                "Url": "/stores/chelsea/en/c/clothing",
                "ProductCount": 268,
                "UrlSafeName": "clothing",
                "Description": "Clothing"
            },
            {
                "Url": "/stores/chelsea/en/c/retro",
                "ProductCount": 10,
                "UrlSafeName": "retro",
                "Description": "Retro"
            },
            {
                "Url": "/stores/chelsea/en/c/equipment",
                "ProductCount": 118,
                "UrlSafeName": "equipment",
                "Description": "Equipment"
            },
            {
                "Url": "/stores/chelsea/en/c/homeware",
                "ProductCount": 99,
                "UrlSafeName": "homeware",
                "Description": "Homeware"
            },
            {
                "Url": "/stores/chelsea/en/c/gifts-&-souvenirs",
                "ProductCount": 165,
                "UrlSafeName": "gifts-&-souvenirs",
                "Description": "Gifts & Souvenirs"
            },
            {
                "Url": "/stores/chelsea/en/c/gifts",
                "ProductCount": 63,
                "UrlSafeName": "gifts",
                "Description": "Gifts"
            },
            {
                "Url": "/stores/chelsea/en/c/sale",
                "ProductCount": 142,
                "UrlSafeName": "sale",
                "Description": "SALE"
            },
            {
                "Url": "/stores/chelsea/en/c/features",
                "ProductCount": 395,
                "UrlSafeName": "features",
                "Description": "Features"
            }
        ]
    },
    "PriceFilter": {
        "Id": "chelsea_price",
        "MinAvailableValue": 0.0,
        "MaxAvailableValue": 346.0,
        "From": "From",
        "To": "To",
        "Description": "Price",
        "MinPriceDescription": "Min Price",
        "MaxPriceDescription": "Max Price"
    },
    "AlternativeSearchTerms": [],
    "BreadCrumbs": [],
    "SCodeScript": "",
    "TotalNumberOfItems": 2109,
    "SearchTerm": null,
    "CategoryTitle": "",
    "Categories": [],
    "ShowMemberPrices": false,
    "MonetateMethods": [
        {
            "method": "addCategories",
            "data": []
        },
        {
            "method": "addProducts",
            "data": [
                "1080300",
                "1080321",
                "1078711",
                "1078857",
                "264497",
                "1078866",
                "1069209",
                "1078722",
                "1078715",
                "1078852"
            ]
        },
        {
            "method": "setPageType",
            "data": "category"
        },
        {
            "method": "trackData",
            "data": null
        }
    ],
    "GoogleDataLayer": {
        "portal": null,
        "network": "KITBAG",
        "language": null,
        "currency": null,
        "location": null,
        "pageTitle": null,
        "testTransaction": false,
        "transactionId": null,
        "categoryList": "",
        "userId": null,
        "userEmail": null,
        "productId": 0,
        "md5Email": null,
        "firstName": null,
        "lastName": null,
        "title": null,
        "town": null,
        "county": null,
        "postCode": null,
        "country": null,
        "transactionEmail": null,
        "transactionDate": null,
        "transactionTotal": 0.0,
        "transactionDiscountTotal": 0.0,
        "TransactionTotalGBP": 0.0,
        "transactionSubTotalNetGBP": 0.0,
        "transactionShipping": 0.0,
        "transactionTax": 0.0,
        "transactionCurrency": null,
        "userExisting": false,
        "transactionProducts": null,
        "basketProducts": null,
        "viewedProducts": [
            {
                "id": "1080300",
                "name": "Chelsea Third Stadium Shirt 2019-20 with Gilmour 47 printing",
                "quantity": 1
            },
            {
                "id": "1080321",
                "name": "Chelsea Third Cup Stadium Shirt 2019-20 with Gilmour 47 printing",
                "quantity": 1
            },
            {
                "id": "1078711",
                "name": "Chelsea Third Cup Stadium Shirt 2019-20 with Marcos A. 3 printing",
                "quantity": 1
            },
            {
                "id": "1078857",
                "name": "Chelsea Third Cup Stadium Shirt 2019-20 with England 9 printing",
                "quantity": 1
            },
            {
                "id": "264497",
                "name": "Chelsea Third Stadium Shirt 2019-20",
                "quantity": 1
            },
            {
                "id": "1078866",
                "name": "Chelsea Third Cup Stadium Shirt 2019-20 with Cuthbert 22 printing",
                "quantity": 1
            },
            {
                "id": "1069209",
                "name": "Chelsea Home Stadium Kit 2019-20 - Infants with Giroud 18 printing",
                "quantity": 1
            },
            {
                "id": "1078722",
                "name": "Chelsea Third Cup Stadium Shirt 2019-20 with Zouma  15 printing",
                "quantity": 1
            },
            {
                "id": "1078715",
                "name": "Chelsea Third Cup Stadium Shirt 2019-20 with Barkley 8 printing",
                "quantity": 1
            },
            {
                "id": "1078852",
                "name": "Chelsea Third Cup Stadium Shirt 2019-20 with Bright 4 printing",
                "quantity": 1
            }
        ],
        "productViewed": null,
        "giftCertificate": null,
        "networkUid": null,
        "membershipNumber": null
    },
    "Translations": {}
}

Примечание: type(r) равно dict

Обновление за User-Comment:

import requests
import csv

data = "{\"searchTerm\":\"\",\"startIndex\":0,\"itemsToReturn\":\"60\",\"categories\":[\"kits\",\"away-kit\"],\"multiSelectFilters\":[],\"priceFilter\":{},\"generateSCodeScript\":true,\"showMoreFilters\":[]}"

r = requests.post(
    "https://www.chelseamegastore.com/stores/chelsea/en/Product/DoSearch", json=data).json()

with open("result.csv", 'w', newline="") as f:
    wrirer = csv.writer(f)
    wrirer.writerow(["Name", "Price", "ImageUrl"])
    for item in r["Products"]:
        wrirer.writerow([item["Name"], item["Price"],
                         f"https:{item['ImageUrl']}"])

print("Done")

Вывод: Просмотр онлайн

Вы можете найти запрос XHR в своем браузере developer-tools и затем перейдите на вкладку network, чтобы найти сделанные запросы. проверка

Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...