Я использую SplashRequest для выполнения некоторого кода JavaScript через скрипт Lua.
Если я запускаю короткий список URL-адресов, все работает нормально, но проблема возникает, когда в списке более ~ 50 URL-адресов. Это привело к остановке моего всплеска без уведомления журналов ошибок.
Я использую заставку из докера, я пытался установить тайм-аут
docker run -p 5023:5023 -p 8050:8050 -p 8051:8051 scrapinghub/splash --max-timeout 300
Мой паук также определяет время ожидания
yield SplashRequest(u, endpoint="render.html", callback=self.parse,dont_filter=True, meta={
"url": u,
"Keyword" : kw,
"splash": {"endpoint": "execute", "args": {"lua_source": self.script,'wait': 0.5, 'timeout': 3600}}
})
Начало моего журнала заставок:
2019-03-27 04:02:18+0000 [-] Log opened.
2019-03-27 04:02:18.376478 [-] Splash version: 3.3.1
2019-03-27 04:02:18.380078 [-] Qt 5.9.1, PyQt 5.9.2, WebKit 602.1, sip 4.19.4, Twisted 18.9.0, Lua 5.2
2019-03-27 04:02:18.380331 [-] Python 3.5.2 (default, Nov 12 2018, 13:43:14) [GCC 5.4.0 20160609]
2019-03-27 04:02:18.380775 [-] Open files limit: 1048576
2019-03-27 04:02:18.380978 [-] Can't bump open files limit
2019-03-27 04:02:18.490614 [-] Xvfb is started: ['Xvfb', ':1118895823', '-screen', '0', '1024x768x24', '-nolisten', 'tcp']
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-root'
2019-03-27 04:02:18.796492 [-] proxy profiles support is enabled, proxy profiles path: /etc/splash/proxy-profiles
2019-03-27 04:02:18.796865 [-] memory cache: enabled, private mode: enabled, js cross-domain access: disabled
2019-03-27 04:02:18.993773 [-] verbosity=1, slots=20, argument_cache_max_entries=500, max-timeout=300.0
2019-03-27 04:02:18.994979 [-] Web UI: enabled, Lua: enabled (sandbox: enabled)
2019-03-27 04:02:18.995776 [-] Site starting on 8050
2019-03-27 04:02:18.996131 [-] Starting factory <twisted.web.server.Site object at 0x7f57e820dcf8>
2019-03-27 04:02:18.996736 [-] Server listening on http://0.0.0.0:8050
2019-03-27 04:03:36.389957 [-] "172.17.0.1" - - [27/Mar/2019:04:03:36 +0000] "GET /robots.txt HTTP/1.1" 404 153 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36"
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
process 1: D-Bus library appears to be incorrectly set up; failed to read machine uuid: UUID file '/etc/machine-id' should contain a hex string of length 32, not length 0, with no other text
See the manual page for dbus-uuidgen to correct this issue.
qt.network.ssl: QSslSocket: cannot resolve SSLv2_client_method
qt.network.ssl: QSslSocket: cannot resolve SSLv2_server_method