как использовать логическое ИЛИ в скрапе? - PullRequest
0 голосов
/ 30 сентября 2018

Я хочу использовать два правила в пауке и сделать их логическими ИЛИ (||) вместе.

Код выглядит следующим образом:

for urlrule in urlrules:
    if urlrule['rule'] is not 'nan':
        allSpider.rules = [Rule(LinkExtractor(allow=(urlrule['rule'],), ), callback="parse_items", follow=True)]
    elif urlrule['restrictXP'] is not 'nan':
        allSpider.rules = [Rule(LinkExtractor(restrict_xpaths=urlrule['restrictXP']), callback='parse_items', follow=True)]
    else:
        print('Undefined Rule!')
        break

if urlrule['rule'] is not 'nan' Этот разделчитать в CSV-файле.

Но есть проблема, только первая часть if рассматривается.И когда я запускаю его, он возвращает следующее:

Unhandled error in Deferred:
2018-09-30 13:18:58 [twisted] CRITICAL: 
Unhandled error in Deferred:

2018-09-30 13:18:58 [twisted] CRITICAL: 
Traceback (most recent call last):
File "/home/reyhaneh/.local/lib/python2.7/site-         packages/twisted/internet/defer.py", line 1386, in   _inlineCallbacks
result = g.send(result)
File "/home/reyhaneh/.local/lib/python2.7   /site-packages/scrapy/crawler.py", line 98, in crawl
six.reraise(*exc_info)
File "/home/reyhaneh/.local/lib/python2.7/site-   packages/scrapy/crawler.py", line 79, in crawl
self.spider = self._create_spider(*args,    **kwargs)
File "/home/reyhaneh/.local/lib/python2.7/site-   packages/scrapy/crawler.py", line 102, in   _create_spider
return self.spidercls.from_crawler(self,    *args, **kwargs)
File "/home/reyhaneh/.local/lib/python2.7   /site-packages/scrapy/spiders/crawl.py", line 100,     in from_crawler
spider = super(CrawlSpider,    cls).from_crawler(crawler, *args, **kwargs)
File "/home/reyhaneh/.local/lib/python2.7  /site-packages/scrapy/spiders/__init__.py", line 51,    in from_crawler
spider = cls(*args, **kwargs)
File "/home/reyhaneh/PycharmProjects/total  /total.py", line 25, in __init__
allSpider.rules = [Rule(LinkExtractor(allow=   (urlrule['rule'],), ), callback="parse_items",    follow=True)]
File "/home/reyhaneh/.local/lib/python2.7  /site-packages/scrapy/linkextractors/lxmlhtml.py",    line 116, in __init__
canonicalize=canonicalize,     deny_extensions=deny_extensions)
File "/home/reyhaneh/.local/lib/python2.7/site-packages/scrapy/linkextractors/__init__.py",    line 57, in __init__
for x in arg_to_iter(allow)]
File "/usr/lib/python2.7/re.py", line 194, in   compile
return _compile(pattern, flags)
File "/usr/lib/python2.7/re.py", line 247, in _compile
raise TypeError, "first argument must be string or compiled pattern"
TypeError: first argument must be string or compiled pattern

Как я могу это исправить?

...