Цель:
Сборка Docker контейнера для загрузки веб-скребка селена в AWS Лямбда, используя эту страницу в качестве руководства https://robertorocha.info/setting-up-a-selenium-web-scraper-on-aws-lambda-with-python/.
В настоящее время тестируется локальная среда Lambda с помощью команды make docker-run
, которая выдает следующую ошибку:
{"errorType":"WebDriverException","errorMessage":"Message: unknown error:
Chrome failed to start: exited abnormally\n (chrome not reachable)\n (The
process started from chrome location /var/task/bin/headless-chromium is no
longer running, so ChromeDriver is assuming that Chrome has
crashed.)\n","stackTrace":[" File \"/var/lang/lib/python3.7/imp.py\", line
234, in load_module\n return load_source(name, filename, file)\n","
File \"/var/lang/lib/python3.7/imp.py\", line 171, in load_source\n
module = _load(spec)\n"," File \"\u003cfrozen
importlib._bootstrap\u003e\", line 696, in _load\n"," File \"\u003cfrozen
importlib._bootstrap\u003e\", line 677, in _load_unlocked\n"," File
\"\u003cfrozen importlib._bootstrap_external\u003e\", line 728, in
exec_module\n"," File \"\u003cfrozen importlib._bootstrap\u003e\", line
219, in _call_with_frames_removed\n"," File
\"/var/task/src/lambda_function.py\", line 150, in \u003cmodule\u003e\n
lambda_handler()\n"," File \"/var/task/src/lambda_function.py\", line 33,
in lambda_handler\n driver =
webdriver.Chrome(executable_path=os.getcwd() + '/bin/chromedriver',
chrome_options=chrome_options)\n"," File
\"/var/task/lib/selenium/webdriver/chrome/webdriver.py\", line 81, in
__init__\n desired_capabilities=desired_capabilities)\n"," File
\"/var/task/lib/selenium/webdriver/remote/webdriver.py\", line 157, in
__init__\n self.start_session(capabilities, browser_profile)\n"," File
\"/var/task/lib/selenium/webdriver/remote/webdriver.py\", line 252, in
start_session\n response = self.execute(Command.NEW_SESSION,
parameters)\n"," File \"/var/task/lib/selenium/webdriver/remote/webdriver.py\", line 321, in
execute\n self.error_handler.check_response(response)\n"," File
\"/var/task/lib/selenium/webdriver/remote/errorhandler.py\", line 242, in
check_response\n raise exception_class(message, screen, stacktrace)\n"]}
Версии:
chrome 79.0.3945.88
безголовый хром 1.0.0-55
python 3.7.1
из require.txt:
boto3 == 1.9.24
botocore == 1.12.24
селен == 3.141
chromedriver-binary == 79.0.3945.36 (https://chromedriver.storage.googleapis.com/79.0.3945.36/chromedriver_linux64.zip)
beautifulsoup4 == 4.6.3
Опции Chromedriver из скрипта scraper:
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument('--window-size=1280x1696')
chrome_options.add_argument('--user-data-dir=/tmp/user-data')
chrome_options.add_argument('--hide-scrollbars')
chrome_options.add_argument('--enable-logging')
chrome_options.add_argument('--log-level=0')
chrome_options.add_argument('--v=99')
chrome_options.add_argument('--single-process')
chrome_options.add_argument('--data-path=/tmp/data-path')
chrome_options.add_argument('--ignore-certificate-errors')
chrome_options.add_argument('--homedir=/tmp')
chrome_options.add_argument('--disk-cache-dir=/tmp/cache-dir')
chrome_options.add_argument('user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36')
chrome_options.binary_location = os.getcwd() + '/bin/headless-chromium'
driver = webdriver.Chrome(executable_path=os.getcwd() + '/bin/chromedriver', chrome_options=chrome_options)
Связанный код Makefile:
# Get chromedriver
curl -SL https://chromedriver.storage.googleapis.com/79.0.3945.36/chromedriver_linux64.zip > chromedriver.zip
unzip chromedriver.zip -d bin/
# Get Headless-chrome
curl -SL https://github.com/adieuadieu/serverless-chrome/releases/download/v1.0.0-55/stable-headless-chromium-amazonlinux-2017-03.zip > headless-chromium.zip
unzip headless-chromium.zip -d bin/
Связанный код docker -compose.yml (переменные env) :
# scraper path
PYTHONPATH=/var/task/src:/var/task/lib
# chromedriver and headless-chromium path
PATH=/var/task/bin
Попытки:
Пониженный chrome драйвер (v2.43) и селен (3.14)
Пониженный chrome драйвер (v2.41) и селен (3,141)