Время выполнения задач Airflow для запуска Athena Query на AWS истекло - PullRequest
0 голосов
/ 25 марта 2020

Я установил горизонтальное масштабирование воздушного потока на AWS, как показано ниже. Под одним AWS ECS -> 4 экземпляра EC2 -> два из EC2 запускают службы (redis, postgres, веб-сервер, сельдерей, планировщик), другие два EC2 работают с 2-мя рабочими воздушного потока.

Когда я запускаю группу обеспечения доступности баз данных, веб-сервер и цветок отображают состояние задания / задачи. Задача здесь состоит в том, чтобы выполнить несколько команд перетаскивания и создания таблиц в Афине. Однако задачи получают удар и выдают ошибку соединения с Афиной

"ERROR - Connect timeout on endpoint URL: "https://athena.us-west-2.amazonaws.com/" 

ниже приведен журнал задач воздушного потока:

*** Reading remote log from s3://nyegireddi-dev-s3/airflow/logs/localny_create_athena_tables/drop_table_fqa_bug_athena/2020-03-25T16:42:03.678648+00:00/1.log.
[2020-03-25 09:45:25,810] {{taskinstance.py:620}} INFO - Dependencies all met for <TaskInstance: localny_create_athena_tables.drop_table_fqa_bug_athena 2020-03-25T16:42:03.678648+00:00 [queued]>
[2020-03-25 09:45:26,309] {{taskinstance.py:620}} INFO - Dependencies all met for <TaskInstance: localny_create_athena_tables.drop_table_fqa_bug_athena 2020-03-25T16:42:03.678648+00:00 [queued]>
[2020-03-25 09:45:26,309] {{taskinstance.py:838}} INFO - 
--------------------------------------------------------------------------------
[2020-03-25 09:45:26,310] {{taskinstance.py:839}} INFO - Starting attempt 1 of 2
[2020-03-25 09:45:26,310] {{taskinstance.py:840}} INFO - 
--------------------------------------------------------------------------------
[2020-03-25 09:45:26,420] {{taskinstance.py:859}} INFO - Executing <Task(AWSAthenaOperator): drop_table_fqa_bug_athena> on 2020-03-25T16:42:03.678648+00:00
[2020-03-25 09:45:26,420] {{base_task_runner.py:133}} INFO - Running: ['airflow', 'run', 'localny_create_athena_tables', 'drop_table_fqa_bug_athena', '2020-03-25T16:42:03.678648+00:00', '--job_id', '71214', '--pool', 'default_pool', '--raw', '-sd', 'DAGS_FOLDER/localny/create_athena_tables.py', '--cfg_path', '/tmp/tmpthq4d58r']
[2020-03-25 09:45:34,924] {{base_task_runner.py:115}} INFO - Job 71214: Subtask drop_table_fqa_bug_athena /usr/local/lib/python3.7/site-packages/airflow/configuration.py:226: FutureWarning: The task_runner setting in [core] has the old default value of 'BashTaskRunner'. This value has been changed to 'StandardTaskRunner' in the running config, but please update your config before Apache Airflow 2.0.
[2020-03-25 09:45:34,924] {{base_task_runner.py:115}} INFO - Job 71214: Subtask drop_table_fqa_bug_athena   FutureWarning
[2020-03-25 09:45:34,925] {{base_task_runner.py:115}} INFO - Job 71214: Subtask drop_table_fqa_bug_athena /usr/local/lib/python3.7/site-packages/airflow/configuration.py:606: DeprecationWarning: Specifying both AIRFLOW_HOME environment variable and airflow_home in the config file is deprecated. Please use only the AIRFLOW_HOME environment variable and remove the config file entry.
[2020-03-25 09:45:34,925] {{base_task_runner.py:115}} INFO - Job 71214: Subtask drop_table_fqa_bug_athena   warnings.warn(msg, category=DeprecationWarning)
[2020-03-25 09:45:34,925] {{base_task_runner.py:115}} INFO - Job 71214: Subtask drop_table_fqa_bug_athena /usr/local/lib/python3.7/site-packages/airflow/config_templates/airflow_local_settings.py:65: DeprecationWarning: The elasticsearch_host option in [elasticsearch] has been renamed to host - the old setting has been used, but please update your config.
[2020-03-25 09:45:34,925] {{base_task_runner.py:115}} INFO - Job 71214: Subtask drop_table_fqa_bug_athena   ELASTICSEARCH_HOST = conf.get('elasticsearch', 'HOST')
[2020-03-25 09:45:34,925] {{base_task_runner.py:115}} INFO - Job 71214: Subtask drop_table_fqa_bug_athena /usr/local/lib/python3.7/site-packages/airflow/config_templates/airflow_local_settings.py:67: DeprecationWarning: The elasticsearch_log_id_template option in [elasticsearch] has been renamed to log_id_template - the old setting has been used, but please update your config.
[2020-03-25 09:45:34,925] {{base_task_runner.py:115}} INFO - Job 71214: Subtask drop_table_fqa_bug_athena   ELASTICSEARCH_LOG_ID_TEMPLATE = conf.get('elasticsearch', 'LOG_ID_TEMPLATE')
[2020-03-25 09:45:34,925] {{base_task_runner.py:115}} INFO - Job 71214: Subtask drop_table_fqa_bug_athena /usr/local/lib/python3.7/site-packages/airflow/config_templates/airflow_local_settings.py:69: DeprecationWarning: The elasticsearch_end_of_log_mark option in [elasticsearch] has been renamed to end_of_log_mark - the old setting has been used, but please update your config.
[2020-03-25 09:45:34,925] {{base_task_runner.py:115}} INFO - Job 71214: Subtask drop_table_fqa_bug_athena   ELASTICSEARCH_END_OF_LOG_MARK = conf.get('elasticsearch', 'END_OF_LOG_MARK')
[2020-03-25 09:45:35,700] {{base_task_runner.py:115}} INFO - Job 71214: Subtask drop_table_fqa_bug_athena [2020-03-25 09:45:35,627] {settings.py:213} INFO - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=571
[2020-03-25 09:45:36,701] {{base_task_runner.py:115}} INFO - Job 71214: Subtask drop_table_fqa_bug_athena [2020-03-25 09:45:36,619] {__init__.py:51} INFO - Using executor CeleryExecutor
[2020-03-25 09:45:38,435] {{base_task_runner.py:115}} INFO - Job 71214: Subtask drop_table_fqa_bug_athena [2020-03-25 09:45:38,435] {dagbag.py:90} INFO - Filling up the DagBag from /usr/local/airflow/dags/localny/create_athena_tables.py
[2020-03-25 09:45:39,950] {{base_task_runner.py:115}} INFO - Job 71214: Subtask drop_table_fqa_bug_athena [2020-03-25 09:45:39,949] {cli.py:516} INFO - Running <TaskInstance: localny_create_athena_tables.drop_table_fqa_bug_athena 2020-03-25T16:42:03.678648+00:00 [running]> on host ip-172-31-27-180.us-west-2.compute.internal
[2020-03-25 10:15:47,954] {{taskinstance.py:1051}} ERROR - Connect timeout on endpoint URL: "https://athena.us-west-2.amazonaws.com/"
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 157, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/connection.py", line 84, in create_connection
    raise err
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/connection.py", line 74, in create_connection
    sock.connect(sa)
socket.timeout: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/botocore/httpsession.py", line 263, in send
    chunked=self._chunked(request.headers),
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 720, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/retry.py", line 376, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/local/lib/python3.7/site-packages/urllib3/packages/six.py", line 735, in reraise
    raise value
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 672, in urlopen
    chunked=chunked,
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 376, in _make_request
    self._validate_conn(conn)
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 994, in _validate_conn
    conn.connect()
  File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 334, in connect
    conn = self._new_conn()
  File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 164, in _new_conn
    % (self.host, self.timeout),
urllib3.exceptions.ConnectTimeoutError: (<botocore.awsrequest.AWSHTTPSConnection object at 0x7f141f701390>, 'Connection to athena.us-west-2.amazonaws.com timed out. (connect timeout=60)')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 926, in _run_raw_task
    result = task_copy.execute(context=context)
  File "/usr/local/lib/python3.7/site-packages/airflow/contrib/operators/aws_athena_operator.py", line 82, in execute
    self.result_configuration, self.client_request_token)
  File "/usr/local/lib/python3.7/site-packages/airflow/contrib/hooks/aws_athena_hook.py", line 70, in run_query
    ResultConfiguration=result_configuration)
  File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 357, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 648, in _make_api_call
    operation_model, request_dict, request_context)
  File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 667, in _make_request
    return self._endpoint.make_request(operation_model, request_dict)
  File "/usr/local/lib/python3.7/site-packages/botocore/endpoint.py", line 102, in make_request
    return self._send_request(request_dict, operation_model)
  File "/usr/local/lib/python3.7/site-packages/botocore/endpoint.py", line 137, in _send_request
    success_response, exception):
  File "/usr/local/lib/python3.7/site-packages/botocore/endpoint.py", line 231, in _needs_retry
    caught_exception=caught_exception, request_dict=request_dict)
  File "/usr/local/lib/python3.7/site-packages/botocore/hooks.py", line 356, in emit
    return self._emitter.emit(aliased_event_name, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/botocore/hooks.py", line 228, in emit
    return self._emit(event_name, kwargs)
  File "/usr/local/lib/python3.7/site-packages/botocore/hooks.py", line 211, in _emit
    response = handler(**kwargs)
  File "/usr/local/lib/python3.7/site-packages/botocore/retryhandler.py", line 183, in __call__
    if self._checker(attempts, response, caught_exception):
  File "/usr/local/lib/python3.7/site-packages/botocore/retryhandler.py", line 251, in __call__
    caught_exception)
  File "/usr/local/lib/python3.7/site-packages/botocore/retryhandler.py", line 277, in _should_retry
    return self._checker(attempt_number, response, caught_exception)
  File "/usr/local/lib/python3.7/site-packages/botocore/retryhandler.py", line 317, in __call__
    caught_exception)
  File "/usr/local/lib/python3.7/site-packages/botocore/retryhandler.py", line 223, in __call__
    attempt_number, caught_exception)
  File "/usr/local/lib/python3.7/site-packages/botocore/retryhandler.py", line 359, in _check_caught_exception
    raise caught_exception
  File "/usr/local/lib/python3.7/site-packages/botocore/endpoint.py", line 200, in _do_get_response
    http_response = self._send(request)
  File "/usr/local/lib/python3.7/site-packages/botocore/endpoint.py", line 244, in _send
    return self.http_session.send(request)
  File "/usr/local/lib/python3.7/site-packages/botocore/httpsession.py", line 287, in send
    raise ConnectTimeoutError(endpoint_url=request.url, error=e)
botocore.exceptions.ConnectTimeoutError: Connect timeout on endpoint URL: "https://athena.us-west-2.amazonaws.com/"
[2020-03-25 10:15:47,956] {{taskinstance.py:1074}} INFO - Marking task as UP_FOR_RETRY
[2020-03-25 10:15:47,982] {{logging_mixin.py:95}} INFO - [[34m2020-03-25 10:15:47,981[0m] {[34mconfiguration.py:[0m299} WARNING[0m - section/key [[1msmtp[0m/[1msmtp_user[0m] not found in config[0m
[2020-03-25 10:33:16,048] {{taskinstance.py:1086}} ERROR - Failed to send email to: ['sie-la-dataeng@sony.com']
[2020-03-25 10:33:16,048] {{taskinstance.py:1087}} ERROR - [Errno 110] Connection timed out
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 157, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/connection.py", line 84, in create_connection
    raise err
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/connection.py", line 74, in create_connection
    sock.connect(sa)
socket.timeout: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/botocore/httpsession.py", line 263, in send
    chunked=self._chunked(request.headers),
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 720, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/retry.py", line 376, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/local/lib/python3.7/site-packages/urllib3/packages/six.py", line 735, in reraise
    raise value
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 672, in urlopen
    chunked=chunked,
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 376, in _make_request
    self._validate_conn(conn)
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 994, in _validate_conn
    conn.connect()
  File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 334, in connect
    conn = self._new_conn()
  File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 164, in _new_conn
    % (self.host, self.timeout),
urllib3.exceptions.ConnectTimeoutError: (<botocore.awsrequest.AWSHTTPSConnection object at 0x7f141f701390>, 'Connection to athena.us-west-2.amazonaws.com timed out. (connect timeout=60)')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 926, in _run_raw_task
    result = task_copy.execute(context=context)
  File "/usr/local/lib/python3.7/site-packages/airflow/contrib/operators/aws_athena_operator.py", line 82, in execute
    self.result_configuration, self.client_request_token)
  File "/usr/local/lib/python3.7/site-packages/airflow/contrib/hooks/aws_athena_hook.py", line 70, in run_query
    ResultConfiguration=result_configuration)
  File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 357, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 648, in _make_api_call
    operation_model, request_dict, request_context)
  File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 667, in _make_request
    return self._endpoint.make_request(operation_model, request_dict)
  File "/usr/local/lib/python3.7/site-packages/botocore/endpoint.py", line 102, in make_request
    return self._send_request(request_dict, operation_model)
  File "/usr/local/lib/python3.7/site-packages/botocore/endpoint.py", line 137, in _send_request
    success_response, exception):
  File "/usr/local/lib/python3.7/site-packages/botocore/endpoint.py", line 231, in _needs_retry
    caught_exception=caught_exception, request_dict=request_dict)
  File "/usr/local/lib/python3.7/site-packages/botocore/hooks.py", line 356, in emit
    return self._emitter.emit(aliased_event_name, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/botocore/hooks.py", line 228, in emit
    return self._emit(event_name, kwargs)
  File "/usr/local/lib/python3.7/site-packages/botocore/hooks.py", line 211, in _emit
    response = handler(**kwargs)
  File "/usr/local/lib/python3.7/site-packages/botocore/retryhandler.py", line 183, in __call__
    if self._checker(attempts, response, caught_exception):
  File "/usr/local/lib/python3.7/site-packages/botocore/retryhandler.py", line 251, in __call__
    caught_exception)
  File "/usr/local/lib/python3.7/site-packages/botocore/retryhandler.py", line 277, in _should_retry
    return self._checker(attempt_number, response, caught_exception)
  File "/usr/local/lib/python3.7/site-packages/botocore/retryhandler.py", line 317, in __call__
    caught_exception)
  File "/usr/local/lib/python3.7/site-packages/botocore/retryhandler.py", line 223, in __call__
    attempt_number, caught_exception)
  File "/usr/local/lib/python3.7/site-packages/botocore/retryhandler.py", line 359, in _check_caught_exception
    raise caught_exception
  File "/usr/local/lib/python3.7/site-packages/botocore/endpoint.py", line 200, in _do_get_response
    http_response = self._send(request)
  File "/usr/local/lib/python3.7/site-packages/botocore/endpoint.py", line 244, in _send
    return self.http_session.send(request)
  File "/usr/local/lib/python3.7/site-packages/botocore/httpsession.py", line 287, in send
    raise ConnectTimeoutError(endpoint_url=request.url, error=e)
botocore.exceptions.ConnectTimeoutError: Connect timeout on endpoint URL: "https://athena.us-west-2.amazonaws.com/"

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1076, in handle_failure
    self.email_alert(error)
  File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1307, in email_alert
    send_email(self.task.email, subject, html_content)
  File "/usr/local/lib/python3.7/site-packages/airflow/utils/email.py", line 55, in send_email
    mime_subtype=mime_subtype, mime_charset=mime_charset, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/airflow/utils/email.py", line 101, in send_email_smtp
    send_MIME_email(smtp_mail_from, recipients, msg, dryrun)
  File "/usr/local/lib/python3.7/site-packages/airflow/utils/email.py", line 121, in send_MIME_email
    s = smtplib.SMTP_SSL(SMTP_HOST, SMTP_PORT) if SMTP_SSL else smtplib.SMTP(SMTP_HOST, SMTP_PORT)
  File "/usr/local/lib/python3.7/smtplib.py", line 251, in __init__
    (code, msg) = self.connect(host, port)
  File "/usr/local/lib/python3.7/smtplib.py", line 336, in connect
    self.sock = self._get_socket(host, port, self.timeout)
  File "/usr/local/lib/python3.7/smtplib.py", line 307, in _get_socket
    self.source_address)
  File "/usr/local/lib/python3.7/socket.py", line 727, in create_connection
    raise err
  File "/usr/local/lib/python3.7/socket.py", line 716, in create_connection
    sock.connect(sa)
TimeoutError: [Errno 110] Connection timed outdrop_table_fqa_bug_athena urllib3.exceptions.ConnectTimeoutError: (<botocore.awsrequest.AWSHTTPSConnection object at 0x7f141f701390>, 'Connection to athena.us-west-2.amazonaws.com timed out. (connect timeout=60)')
[2020-03-25 10:33:16,084] {{base_task_runner.py:115}} INFO - Job 71214: Subtask drop_table_fqa_bug_athena 
...