Проблема с подключением Scrapyd-клиента к mon go atlas pymon go .errors.ServerSelectionTimeoutError - PullRequest
0 голосов
/ 30 апреля 2020

Я пытаюсь развернуть паука на экземпляре aws ec2, который выгружает просканированную информацию в атлас mongodb. Паук в порядке и работает как надо (добавляет новую информацию в db) при вводе scrapy crawl <spidername>, но не сканирует при вызове экземпляра ec2. Мои проблемы:

  1. Мне не удалось запустить этот паук из моего ec2
  2. Я не знаю, как планировать паука, когда он действительно развернут, и мог бы воспользоваться советом

После запуска curl (my-ec2-url):6800/schedule.json -d project=jneuro -d spider=jneuro_spider на моей локальной машине мой журнал показывает:

2020-04-30 19:55:39 [scrapy.utils.log] INFO: Scrapy 2.1.0 started (bot: jneurosci)2020-04-30 19:55:39 [scrapy.utils.log] INFO: Versions: lxml 4.5.0.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.5.2, w3lib 1.21.0, Twisted 20.3.0, Python 3.6.10 (default, Feb 10 2020, 19:55:14) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)], pyOpenSSL 19.1.0 (OpenSSL 1.1.1g  21 Apr 2020), cryptography 2.9.2, Platform Linux-4.14.171-105.231.amzn1.x86_64-x86_64-with-glibc2.3.4
2020-04-30 19:55:39 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.epollreactor.EPollReactor
2020-04-30 19:55:39 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'jneurosci',
 'LOG_FILE': 'logs/jneuro/jneuro/8e4ee02c8b1c11eab8350a2092b3924a.log',
 'NEWSPIDER_MODULE': 'jneurosci.spiders',
 'ROBOTSTXT_OBEY': True,
 'SPIDER_MODULES': ['jneurosci.spiders']}
2020-04-30 19:55:39 [scrapy.extensions.telnet] INFO: Telnet Password: 3478769ec93271ab
2020-04-30 19:55:39 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.logstats.LogStats']
2020-04-30 19:55:39 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2020-04-30 19:55:39 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2020-04-30 19:56:12 [twisted] CRITICAL: Unhandled error in Deferred:
2020-04-30 19:56:12 [twisted] CRITICAL: 
Traceback (most recent call last):
  File "/home/ec2-user/.local/lib/python3.6/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
  File "/home/ec2-user/.local/lib/python3.6/site-packages/scrapy/crawler.py", line 87, in crawl
    self.engine = self._create_engine()
  File "/home/ec2-user/.local/lib/python3.6/site-packages/scrapy/crawler.py", line 101, in _create_engine
    return ExecutionEngine(self, lambda _: self.stop())
  File "/home/ec2-user/.local/lib/python3.6/site-packages/scrapy/core/engine.py", line 70, in __init__
    self.scraper = Scraper(crawler)
  File "/home/ec2-user/.local/lib/python3.6/site-packages/scrapy/core/scraper.py", line 71, in __init__
    self.itemproc = itemproc_cls.from_crawler(crawler)
  File "/home/ec2-user/.local/lib/python3.6/site-packages/scrapy/middleware.py", line 53, in from_crawler
    return cls.from_settings(crawler.settings, crawler)
  File "/home/ec2-user/.local/lib/python3.6/site-packages/scrapy/middleware.py", line 34, in from_settings
    mwcls = load_object(clspath)
  File "/home/ec2-user/.local/lib/python3.6/site-packages/scrapy/utils/misc.py", line 50, in load_object
    mod = import_module(module)
  File "/usr/lib64/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 656, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 626, in _load_backward_compatible
  File "/tmp/jneuro-1588273690-fp_er4rk.egg/jneurosci/pipelines.py", line 7, in <module>
  File "/home/ec2-user/.local/lib/python3.6/site-packages/pymongo/collection.py", line 1103, in drop
    dbo.drop_collection(self.__name, session=session)
  File "/home/ec2-user/.local/lib/python3.6/site-packages/pymongo/database.py", line 914, in drop_collection
    with self.__client._socket_for_writes(session) as sock_info:
  File "/home/ec2-user/.local/lib/python3.6/site-packages/pymongo/mongo_client.py", line 1266, in _socket_for_writes
    server = self._select_server(writable_server_selector, session)
  File "/home/ec2-user/.local/lib/python3.6/site-packages/pymongo/mongo_client.py", line 1253, in _select_server
    server = topology.select_server(server_selector)
  File "/home/ec2-user/.local/lib/python3.6/site-packages/pymongo/topology.py", line 235, in select_server
    address))
  File "/home/ec2-user/.local/lib/python3.6/site-packages/pymongo/topology.py", line 193, in select_servers
    selector, server_timeout, address)
  File "/home/ec2-user/.local/lib/python3.6/site-packages/pymongo/topology.py", line 209, in _select_servers_loop
    self._error_message(selector))
pymongo.errors.ServerSelectionTimeoutError: cluster0-shard-00-01-ssnba.mongodb.net:27017: timed out,cluster0-shard-00-00-ssnba.mongodb.net:27017: timed out,cluster0-shard-00-02-ssnba.mongodb.net:27017: timed out

...