Я пытаюсь создать простой веб-сканер, который выдает URL-адреса каждого продукта легиона, отображаемого на amazon.in, если искомый ключ - «легион». Я использую следующий код:
import requests
from bs4 import BeautifulSoup
def legion_spider(max_pages):
page = 1
while page <= max_pages:
url = 'https://www.amazon.in/s?k=legion&qid=1588862016&swrs=82DF79C1243AF6D61651CCAA9F883EC4&ref=sr_pg_'+ str(page)
source_code = requests.get(url)
plain_txt = source_code.text
soup = BeautifulSoup(plain_txt)
for link in soup.findAll('a',{'class': 'a-size-medium a-color-base a-text-normal'}):
href = link.get('href')
print(href)
page += 1
legion_spider(1)
, и получаю следующий результат:
C:\Users\lenovo\AppData\Local\Programs\Python\Python38-32\python.exe "E:/Python Practice/web_crawler.py"
E:/Python Practice/web_crawler.py:10: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.
The code that caused this warning is on line 10 of the file E:/Python Practice/web_crawler.py. To get rid of this warning, pass the additional argument 'features="html.parser"' to the BeautifulSoup constructor.
soup = BeautifulSoup(plain_txt)
Process finished with exit code 0