Я запускаю скрипт на AWS (Ubunut) экземпляре EC2. Это веб-скребок, который использует селен / хромедрайвер и безголовый chrome для очистки некоторых веб-страниц. Раньше у меня был этот скрипт без проблем, но сегодня я получаю сообщение об ошибке. Вот скрипт:
options = Options()
options.add_argument('--no-sandbox')
options.add_argument('--window-size=1420,1080')
options.add_argument('--headless')
options.add_argument('--disable-dev-shm-usage')
options.add_argument('--disable-gpu')
options.add_argument("--disable-notifications")
options.binary_location='/usr/bin/chromium-browser'
driver = webdriver.Chrome(chrome_options=options)
#Set base url (SAN FRANCISCO)
base_url = 'https://www.bandsintown.com/en/c/san-francisco-ca?page='
events = []
for i in range(1,90):
#cycle through pages in range
driver.get(base_url + str(i))
pageURL = base_url + str(i)
print(pageURL)
Когда я запускаю этот скрипт из ubuntu, я получаю эту ошибку:
Traceback (most recent call last):
File "BandsInTown_Scraper_SF.py", line 91, in <module>
driver = webdriver.Chrome(chrome_options=options)
File "/home/ubuntu/.local/lib/python3.6/site-packages/selenium/webdriver/chrome/webdriver.py", line 73, in __init__
self.service.start()
File "/home/ubuntu/.local/lib/python3.6/site-packages/selenium/webdriver/common/service.py", line 76, in start
stdin=PIPE)
File "/usr/lib/python3.6/subprocess.py", line 729, in __init__
restore_signals, start_new_session)
File "/usr/lib/python3.6/subprocess.py", line 1295, in _execute_child
restore_signals, start_new_session, preexec_fn)
OSError: [Errno 12] Cannot allocate memory
Я подтвердил, что я использую ту же версию браузера Chromedriver / Chromium :
ChromeDriver 79.0.3945.130 (e22de67c28798d98833a7137c0e22876237fc40a-refs/branch-heads/3945@{#1047})
Chromium 79.0.3945.130 Built on Ubuntu , running on Ubuntu 18.04
Любая помощь приветствуется, потому что я в тупике. Что бы это ни стоило, у меня это работает на компьютере ma c, и у меня есть несколько скриптов веб-скрепинга, подобных этому, выполняющихся на одном и том же экземпляре EC2 (пока только 2 скрипта, так что не так много). Спасибо.
ОБНОВЛЕНИЕ: теперь я также получаю следующие ошибки при попытке запустить этот скрипт в Ubuntu:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 141, in _new_conn
(self.host, self.port), self.timeout, **extra_kw)
File "/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 60, in create_connection
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
File "/usr/lib/python3.6/socket.py", line 745, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -3] Temporary failure in name resolution
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 601, in urlopen
chunked=chunked)
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 346, in _make_request
self._validate_conn(conn)
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 852, in _validate_conn
conn.connect()
File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 284, in connect
conn = self._new_conn()
File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 150, in _new_conn
self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x7f90945757f0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ubuntu/.local/lib/python3.6/site-packages/requests/adapters.py", line 449, in send
timeout=timeout
^[[B File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 639, in urlopen
^[[B^[[A^[[A _stacktrace=sys.exc_info()[2])
File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 398, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='www.bandsintown.com', port=443): Max retries exceeded with url: /en/c/san-francisco-ca?page=6 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f90945757f0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "BandsInTown_Scraper_SF.py", line 39, in <module>
res = requests.get(url)
File "/home/ubuntu/.local/lib/python3.6/site-packages/requests/api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "/home/ubuntu/.local/lib/python3.6/site-packages/requests/api.py", line 60, in request
return session.request(method=method, url=url, **kwargs)
File "/home/ubuntu/.local/lib/python3.6/site-packages/requests/sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "/home/ubuntu/.local/lib/python3.6/site-packages/requests/sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "/home/ubuntu/.local/lib/python3.6/site-packages/requests/adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='www.bandsintown.com', port=443): Max retries exceeded with url: /en/c/san-francisco-ca?page=6 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f90945757f0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))
Наконец, вот мое ежемесячное использование AWS в месяц, которое не не показывать превышение квоты памяти.