Python - код Scrapy работает на одном сайте, не работает на другом с настроенными селекторами - PullRequest
1 голос
/ 08 мая 2020

Я только изучаю Scrapy и Python и у меня возникла эта проблема.

При очистке этого веб-сайта: http://www.laughfactory.com/jokes/family-jokes код работает отлично.

class JokesSpider(scrapy.Spider):
name = 'jokes'
allowed_domains = ['www.laughfactory.com']
start_urls = ["http://www.laughfactory.com/jokes/family-jokes"]

def parse(self, response):
    for joke in response.xpath("//div[@class='jokes']"):

        yield {
            'joke_text': joke.xpath(".//div[@class='joke-text']").extract_first()
        }

При использовании аналогичного кода на другом веб-сайте: https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077 код:

class eKupiSingleCategoryXPath(scrapy.Spider):
name = "monitor_xpath"
allowed_domains = ["https://www.ekupi.hr/hr/"]
start_urls = ["https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077"]

def parse(self, response):
    for monitorSelectXPath in response.xpath("//div[@class='details']"):
        sleep(1)

        yield {
            "name": monitorSelectXPath.xpath("//a[@class='name']/text()").extract_first()
        }

Я считаю, что использую правильные селекторы, и я считаю, что код в порядке, поскольку он работает с CSS селекторов. Вывод всегда одинаковый с селекторами xpath.

Вывод ниже:

2020-05-07 23:04:17 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:18 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:19 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:20 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:22 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:23 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:24 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:25 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:28 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:31 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:32 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:33 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:34 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:35 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:36 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:38 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:39 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:40 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:41 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:41 [scrapy.core.engine] INFO: Closing spider (finished)

1 Ответ

0 голосов
/ 08 мая 2020

Удалить // в выражении xpath. Обновите оператор yield, как показано ниже.

yield {
            "name": monitorSelectXPath.xpath("a[@class='name']/text()").extract_first()
        }

Также оболочка scrapy позволяет тестировать ваши селекторы. Команда терминала ниже:

scrapy shell https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077
...