Хотя я написал сценарий для очистки данных с сайта, и он работает идеально, но после очистки данных около 18 страниц (так как существует около 42 страниц), зацикливание застревает, давая информацию журнала после и после.
Я посетил похожие вопросы с ответами на stackoverflow, но во всех них сценарии не работали с самого начала, в то время как в моем случае сценарий собирал данные с 18 страниц, а затем зависал.
Вот скрипт
# -*- coding: utf-8 -*-
import scrapy
import logging
class KhaadiSpider(scrapy.Spider):
name = 'khaadi'
start_urls = ['https://www.khaadi.com/pk/woman.html/']
def parse(self, response):
urls= response.xpath('//ol/li/div/a/@href').extract()
for url in urls:
yield scrapy.Request(url, callback=self.product_page)
next_page=response.xpath('//*[@class="action next"]/@href').extract_first()
while(next_page!=None):
yield scrapy.Request(next_page)
logging.info("Scraped all the pages Successfuly....")
def product_page(self,response):
image= response.xpath('//*[@class="MagicZoom"]/@href').extract_first()
page_title= response.xpath('//*[@class="page-title"]/span/text()').extract_first()
price=response.xpath('//*[@class="price"]/text()').extract_first()
page_url=response.url
yield {'Image':image,
"Page Title":page_title,
"Price":price,
"Page Url":page_url
}
Это информация регистратора
2019-10-05 11:22:06 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.khaadi.com/pk/ksffs19301-blue.html> (referer: https://www.khaadi.com/pk/woman.html?p=18)
2019-10-05 11:22:06 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.khaadi.com/pk/b19428-pink-3pc.html>
{'Image': u'https://www.khaadi.com/media/catalog/product/cache/10f519365b01716ddb90abc57de5a837/b/1/b19428b.jpg', 'Page Title': u'Shirt Shalwar Dupatta', 'Page Url': 'https://www.khaadi.com/pk/b19428-pink-3pc.html', 'Price': u'PKR2,170'}
2019-10-05 11:22:06 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.khaadi.com/pk/i19417-blue-2pc.html>
{'Image': u'https://www.khaadi.com/media/catalog/product/cache/10f519365b01716ddb90abc57de5a837/i/1/i19417b.jpg', 'Page Title': u'Shirt Shalwar', 'Page Url': 'https://www.khaadi.com/pk/i19417-blue-2pc.html', 'Price': u'PKR1,680'}
2019-10-05 11:22:07 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.khaadi.com/pk/wshe19498-off-white.html>
{'Image': u'https://www.khaadi.com/media/catalog/product/cache/10f519365b01716ddb90abc57de5a837/w/s/wshe19498_2_.jpg', 'Page Title': u'Embroidered Shalwar', 'Page Url': 'https://www.khaadi.com/pk/wshe19498-off-white.html', 'Price': u'PKR1,800'}
2019-10-05 11:22:07 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.khaadi.com/pk/wet19401-off-white.html>
{'Image': u'https://www.khaadi.com/media/catalog/product/cache/10f519365b01716ddb90abc57de5a837/w/e/wet19401_offwhite__1_.jpg', 'Page Title': u'EMBELLISHED TIGHTS', 'Page Url': 'https://www.khaadi.com/pk/wet19401-off-white.html', 'Price': u'PKR1,000'}
2019-10-05 11:22:07 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.khaadi.com/pk/wet19402-black.html>
{'Image': u'https://www.khaadi.com/media/catalog/product/cache/10f519365b01716ddb90abc57de5a837/w/e/wet19402_black__2_.jpg', 'Page Title': u'EMBELLISHED TIGHTS', 'Page Url': 'https://www.khaadi.com/pk/wet19402-black.html', 'Price': u'PKR1,000'}
2019-10-05 11:22:07 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.khaadi.com/pk/k19407-yellow-3pc.html>
{'Image': u'https://www.khaadi.com/media/catalog/product/cache/10f519365b01716ddb90abc57de5a837/k/1/k19407b.jpg', 'Page Title': u'Shirt Shalwar Dupatta', 'Page Url': 'https://www.khaadi.com/pk/k19407-yellow-3pc.html', 'Price': u'PKR2,940'}
2019-10-05 11:22:07 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.khaadi.com/pk/k19408-blue-3pc.html>
{'Image': u'https://www.khaadi.com/media/catalog/product/cache/10f519365b01716ddb90abc57de5a837/k/1/k19408a.jpg', 'Page Title': u'Shirt Shalwar Dupatta', 'Page Url': 'https://www.khaadi.com/pk/k19408-blue-3pc.html', 'Price': u'PKR2,940'}
2019-10-05 11:22:07 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.khaadi.com/pk/wet19408-pink.html>
{'Image': u'https://www.khaadi.com/media/catalog/product/cache/10f519365b01716ddb90abc57de5a837/w/e/wet19408_pink__1_.jpg', 'Page Title': u'EMBELLISHED TIGHTS', 'Page Url': 'https://www.khaadi.com/pk/wet19408-pink.html', 'Price': u'PKR1,000'}
2019-10-05 11:22:07 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.khaadi.com/pk/wbme19474-off-white.html>
{'Image': u'https://www.khaadi.com/media/catalog/product/cache/10f519365b01716ddb90abc57de5a837/w/b/wbme19474_offwhite__1_.jpg', 'Page Title': u'Embroidered Metallica Pants', 'Page Url': 'https://www.khaadi.com/pk/wbme19474-off-white.html', 'Price': u'PKR2,400'}
2019-10-05 11:22:07 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.khaadi.com/pk/ksffs19301-blue.html>
{'Image': u'https://www.khaadi.com/media/catalog/product/cache/10f519365b01716ddb90abc57de5a837/k/s/ksffs19301_blue__2_.jpg', 'Page Title': u'Semi Formal Full Suit', 'Page Url': 'https://www.khaadi.com/pk/ksffs19301-blue.html', 'Price': u'PKR18,000'}
2019-10-05 11:22:24 [scrapy.extensions.logstats] INFO: Crawled 326 pages (at 326 pages/min), scraped 307 items (at 307 items/min)
2019-10-05 11:23:24 [scrapy.extensions.logstats] INFO: Crawled 326 pages (at 0 pages/min), scraped 307 items (at 0 items/min)
2019-10-05 11:24:24 [scrapy.extensions.logstats] INFO: Crawled 326 pages (at 0 pages/min), scraped 307 items (at 0 items/min)
2019-10-05 11:25:24 [scrapy.extensions.logstats] INFO: Crawled 326 pages (at 0 pages/min), scraped 307 items (at 0 items/min)
2019-10-05 11:26:24 [scrapy.extensions.logstats] INFO: Crawled 326 pages (at 0 pages/min), scraped 307 items (at 0 items/min)
2019-10-05 11:27:24 [scrapy.extensions.logstats] INFO: Crawled 326 pages (at 0 pages/min), scraped 307 items (at 0 items/min)
2019-10-05 11:28:24 [scrapy.extensions.logstats] INFO: Crawled 326 pages (at 0 pages/min), scraped 307 items (at 0 items/min)
2019-10-05 11:29:24 [scrapy.extensions.logstats] INFO: Crawled 326 pages (at 0 pages/min), scraped 307 items (at 0 items/min)
2019-10-05 11:30:24 [scrapy.extensions.logstats] INFO: Crawled 326 pages (at 0 pages/min), scraped 307 items (at 0 items/min)
2019-10-05 11:31:24 [scrapy.extensions.logstats] INFO: Crawled 326 pages (at 0 pages/min), scraped 307 items (at 0 items/min)
2019-10-05 11:32:24 [scrapy.extensions.logstats] INFO: Crawled 326 pages (at 0 pages/min), scraped 307 items (at 0 items/min)
2019-10-05 11:33:24 [scrapy.extensions.logstats] INFO: Crawled 326 pages (at 0 pages/min), scraped 307 items (at 0 items/min)
2019-10-05 11:34:24 [scrapy.extensions.logstats] INFO: Crawled 326 pages (at 0 pages/min), scraped 307 items (at 0 items/min)
2019-10-05 11:35:24 [scrapy.extensions.logstats] INFO: Crawled 326 pages (at 0 pages/min), scraped 307 items (at 0 items/min)
2019-10-05 11:36:24 [scrapy.extensions.logstats] INFO: Crawled 326 pages (at 0 pages/min), scraped 307 items (at 0 items/min)
Все остальные файлы остались по умолчанию.