Вы можете получить доступ к ответу json каждой страницы.Но имейте в виду, что на странице всего 32 товара, что означает, что вы будете запрашивать 659 раз.
import requests
import math
url = 'https://middleware.paytmmall.com/fmcg-foods-glpid-101405'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36'}
payload = {
'channel': 'web',
'child_site_id': '6',
'site_id': '2',
'version': '2',
'discoverability': 'online',
'use_mw': '1',
'category': '101405',
'page': '1',
'page_count': '1',
'items_per_page': '32'}
# Get total pages needed
jsonData = requests.post(url, headers=headers, data=payload).json()
total_count = jsonData['totalCount']
total_pages = total_count / 32
pages = math.ceil(total_pages)
# Iterate through each page
for page in range(1,pages + 1):
payload.update({'page':page, 'page_count':page})
jsonData = requests.post(url, headers=headers, data=payload).json()
for product in jsonData['grid_layout']:
name = product['name']
brand = product['brand']
actual_price = product['actual_price']
try:
category = product['attributes']['type']
except:
category = 'N/A'
print ('%-20s ₹%-5s %-20s ₹%s' %(category, actual_price, brand, name))
Вывод:
Tea ₹185 Red Label Red Label Tea 500 gm
Tea ₹93 Tata Tea Premium Tata Tea Premium Leaf 250 gm
Tea ₹240 Red Label Red Label Natural Care Tea 500 gm
N/A ₹230 Taj Mahal Taj Mahal Tea 500 gm
Tea ₹120 Red Label Red Label Natural Care Tea 250 gm
Dairy Whitener ₹413 Nestle Nestle Everyday Dairy Whitener Milk 1 kg
Sauces ₹125 Kissan Kissan Fresh Tomato Ketchup 950 gm
Whole Oats ₹186 Quaker Quaker Oats 1 kg Pouch
Tea ₹188 Tata Tea Premium Tata Tea Premium Leaf 500 gm
Coffee ₹90 Bru BRU Instant Coffee 50 gm
Almond ₹300 Freshco Freshco California Almonds 200Gm
Jam ₹250 Kissan Kissan Mixed Fruit Jam 1.04 kg
Almond ₹799 glomin Glomin California Almond Raw 500 G 1Pc
Sauces ₹152 Kissan Kissan Sweet & Spicy Sauce 1 kg
Cashew Nut ₹180 Nutty Gritties Nutty Gritties Roasted Salted Cashews 80G
Coffee ₹120 Bru BRU Gold Instant Coffee 50 gm
Tea ₹480 Red Label Red Label Natural Care Tea 1 kg
Almond ₹310 Miltop Miltop California Almonds 250G
Cashew Nut ₹425 glomin Glomin Cashew 250 G 1Pc
Almond ₹600 Wonderland Wonderland California Almond 500g
Almond ₹499 Shivram Peshawari & Bros Shivram Peshawari & Bros California Almonds/Badam 250 Grams
Peanut Butter ₹425 Pintola Pintola All Natural Peanut Butter 1 kg (Crunchy)
Soups ₹55 Knorr Knorr Classic Tomato Soup 53 gm
Peanut Butter ₹425 Pintola Pintola All Natural Peanut Butter 1 kg (Creamy)
Peanut Butter ₹349 Pintola Pintola Classic Peanut Butter 1 kg (Crcuncy)
Peanut Butter ₹165 Pintola Pintola All Natural Peanut Butter 350 gm (Crunchy)
Almond ₹1599 glomin Glomin Raw Almonds 1Kg (Pack Of 1)
Almond ₹150 Nutty Gritties Nutty Gritties Almonds 100G
Raisin ₹250 OOSH Oosh Seedless Black Raisin 250G
N/A ₹455 Taj Mahal Taj Mahal Tea 1 kg
Редактировать:
Если вы хотите иерархию, вам нужно будет перейти по ссылке каждого продукта и извлечь ее.Я предоставил код для этого, но имейте в виду, что это займет FORVER.Если предположить, что на запрос уходит около 2-3 секунд, это займет у вас около 18 часов.
# Iterate through each page
for page in range(1,pages + 1):
payload.update({'page':page, 'page_count':page})
jsonData = requests.post(url, headers=headers, data=payload).json()
for product in jsonData['grid_layout']:
name = product['name']
brand = product['brand']
actual_price = product['actual_price']
img = product['image_url']
category_id = product['category_id']
new_url = product['newurl']
jsonData_product = requests.get(new_url, headers=headers).json()
category = '/'.join( [each['name'] for each in jsonData_product['ancestors'] ] )
print ('Name: %s\nImage: %s\nCategory: %s\n' %(name, img, category))
Вывод:
Name: Red Label Tea 500 gm
Image: https://assetscdn1.paytm.com/images/catalog/product/F/FA/FASRED-LABEL-TETBL497475164B959/a_4.jpg
Category: Supermarket/Foods/Drinks & Beverages/Tea & Coffee/Red Label Tea 500 gm
Name: Tata Tea Premium Leaf 250 gm
Image: https://assetscdn1.paytm.com/images/catalog/product/F/FA/FASTATA-TEA-PREINNO985832A1E145F5/8.jpg
Category: Supermarket/Foods/Drinks & Beverages/Tea & Coffee/Tata Tea Premium Leaf 250 gm
Name: Red Label Natural Care Tea 500 gm
Image: https://assetscdn1.paytm.com/images/catalog/product/F/FA/FASRLNC-C-500GNTBL4974726639099/a_14.jpg
Category: Supermarket/Foods/Drinks & Beverages/Tea & Coffee/Red Label Tea & Coffee 500 Gm
Name: Taj Mahal Tea 500 gm
Image: https://assetscdn1.paytm.com/images/catalog/product/F/FA/FASTAJ-MAHAL-TEBIGB985832F0512392/0.jpg
Category: Supermarket/Foods/Drinks & Beverages/Tea & Coffee/Taj Mahal Tea 500 gm
Name: Red Label Natural Care Tea 250 gm
Image: https://assetscdn1.paytm.com/images/catalog/product/F/FA/FASNEW-RED-LABETBL49747FC4B364F/a_7.jpg
Category: Supermarket/Foods/Drinks & Beverages/Tea & Coffee/Red Label Natural Care Tea 250 gm
Name: Nestle Everyday Dairy Whitener Milk 1 kg
Image: https://assetscdn1.paytm.com/images/catalog/product/F/FA/FASNESTLE-EVERYTBL497478E1F2966/a_8.jpg
Category: Supermarket/Foods/Dairy Products/Dairy Whitener/Nestle Everyday Dairy Whitener Milk 1 kg
ИЛИ
Если все продукты относятся к одним и тем же категориям, то вам действительно нужно получить категории первого продукта, а затем применить ко всем остальным, когда вы будете перебирать страницы:
import requests
import math
url = 'https://middleware.paytmmall.com/fmcg-foods-glpid-101405'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36'}
payload = {
'channel': 'web',
'child_site_id': '6',
'site_id': '2',
'version': '2',
'discoverability': 'online',
'use_mw': '1',
'category': '101405',
'page': '1',
'page_count': '1',
'items_per_page': '32'}
# Get total pages needed
jsonData = requests.post(url, headers=headers, data=payload).json()
total_count = jsonData['totalCount']
total_pages = total_count / 32
pages = math.ceil(total_pages)
# Iterate through each page
category = ''
for page in range(1,pages + 1):
payload.update({'page':page, 'page_count':page})
jsonData = requests.post(url, headers=headers, data=payload).json()
for product in jsonData['grid_layout']:
name = product['name']
brand = product['brand']
actual_price = product['actual_price']
img = product['image_url']
category_id = product['category_id']
if category == '':
new_url = product['newurl']
jsonData_product = requests.get(new_url, headers=headers).json()
category = '/'.join( [each['name'] for each in jsonData_product['ancestors'] ][:-1] )
print ('Name: %s\nImage: %s\nCategory: %s\n' %(name, img, category))