Я пытаюсь развернуть паука на экземпляре aws ec2, который выгружает просканированную информацию в атлас mongodb. Паук в порядке и работает как надо (добавляет новую информацию в db) при вводе scrapy crawl <spidername>
, но не сканирует при вызове экземпляра ec2. Мои проблемы:
- Мне не удалось запустить этот паук из моего ec2
- Я не знаю, как планировать паука, когда он действительно развернут, и мог бы воспользоваться советом
После запуска curl (my-ec2-url):6800/schedule.json -d project=jneuro -d spider=jneuro_spider
на моей локальной машине мой журнал показывает:
2020-04-30 19:55:39 [scrapy.utils.log] INFO: Scrapy 2.1.0 started (bot: jneurosci)2020-04-30 19:55:39 [scrapy.utils.log] INFO: Versions: lxml 4.5.0.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.5.2, w3lib 1.21.0, Twisted 20.3.0, Python 3.6.10 (default, Feb 10 2020, 19:55:14) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)], pyOpenSSL 19.1.0 (OpenSSL 1.1.1g 21 Apr 2020), cryptography 2.9.2, Platform Linux-4.14.171-105.231.amzn1.x86_64-x86_64-with-glibc2.3.4
2020-04-30 19:55:39 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.epollreactor.EPollReactor
2020-04-30 19:55:39 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'jneurosci',
'LOG_FILE': 'logs/jneuro/jneuro/8e4ee02c8b1c11eab8350a2092b3924a.log',
'NEWSPIDER_MODULE': 'jneurosci.spiders',
'ROBOTSTXT_OBEY': True,
'SPIDER_MODULES': ['jneurosci.spiders']}
2020-04-30 19:55:39 [scrapy.extensions.telnet] INFO: Telnet Password: 3478769ec93271ab
2020-04-30 19:55:39 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats']
2020-04-30 19:55:39 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2020-04-30 19:55:39 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2020-04-30 19:56:12 [twisted] CRITICAL: Unhandled error in Deferred:
2020-04-30 19:56:12 [twisted] CRITICAL:
Traceback (most recent call last):
File "/home/ec2-user/.local/lib/python3.6/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
result = g.send(result)
File "/home/ec2-user/.local/lib/python3.6/site-packages/scrapy/crawler.py", line 87, in crawl
self.engine = self._create_engine()
File "/home/ec2-user/.local/lib/python3.6/site-packages/scrapy/crawler.py", line 101, in _create_engine
return ExecutionEngine(self, lambda _: self.stop())
File "/home/ec2-user/.local/lib/python3.6/site-packages/scrapy/core/engine.py", line 70, in __init__
self.scraper = Scraper(crawler)
File "/home/ec2-user/.local/lib/python3.6/site-packages/scrapy/core/scraper.py", line 71, in __init__
self.itemproc = itemproc_cls.from_crawler(crawler)
File "/home/ec2-user/.local/lib/python3.6/site-packages/scrapy/middleware.py", line 53, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "/home/ec2-user/.local/lib/python3.6/site-packages/scrapy/middleware.py", line 34, in from_settings
mwcls = load_object(clspath)
File "/home/ec2-user/.local/lib/python3.6/site-packages/scrapy/utils/misc.py", line 50, in load_object
mod = import_module(module)
File "/usr/lib64/python3.6/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 994, in _gcd_import
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 656, in _load_unlocked
File "<frozen importlib._bootstrap>", line 626, in _load_backward_compatible
File "/tmp/jneuro-1588273690-fp_er4rk.egg/jneurosci/pipelines.py", line 7, in <module>
File "/home/ec2-user/.local/lib/python3.6/site-packages/pymongo/collection.py", line 1103, in drop
dbo.drop_collection(self.__name, session=session)
File "/home/ec2-user/.local/lib/python3.6/site-packages/pymongo/database.py", line 914, in drop_collection
with self.__client._socket_for_writes(session) as sock_info:
File "/home/ec2-user/.local/lib/python3.6/site-packages/pymongo/mongo_client.py", line 1266, in _socket_for_writes
server = self._select_server(writable_server_selector, session)
File "/home/ec2-user/.local/lib/python3.6/site-packages/pymongo/mongo_client.py", line 1253, in _select_server
server = topology.select_server(server_selector)
File "/home/ec2-user/.local/lib/python3.6/site-packages/pymongo/topology.py", line 235, in select_server
address))
File "/home/ec2-user/.local/lib/python3.6/site-packages/pymongo/topology.py", line 193, in select_servers
selector, server_timeout, address)
File "/home/ec2-user/.local/lib/python3.6/site-packages/pymongo/topology.py", line 209, in _select_servers_loop
self._error_message(selector))
pymongo.errors.ServerSelectionTimeoutError: cluster0-shard-00-01-ssnba.mongodb.net:27017: timed out,cluster0-shard-00-00-ssnba.mongodb.net:27017: timed out,cluster0-shard-00-02-ssnba.mongodb.net:27017: timed out