Планировщик воздушного потока продолжает падать, ошибка подключения к БД (Google Composer) - PullRequest
0 голосов
/ 28 июня 2018

Я уже некоторое время пользуюсь Google Composer (composer-0.5.2-airflow-1.9.0), и у меня возникли некоторые проблемы с планировщиком воздушного потока. Контейнер планировщика иногда падает, и он может попасть в заблокированную ситуацию, в которой он не может запустить какие-либо новые задачи (ошибка с подключением к базе данных), поэтому мне нужно заново создать всю среду Composer. На этот раз CrashLoopBackOff, и модуль планировщика больше не может перезапускаться. Ошибка очень похожа на то, что я тоже имел раньше. Вот след от Stackdriver:

Traceback (most recent call last):
  File "/usr/local/bin/airflow", line 27, in <module>
    args.func(args)
  File "/usr/local/lib/python2.7/site-packages/airflow/bin/cli.py", line 826, in scheduler
    job.run()
  File "/usr/local/lib/python2.7/site-packages/airflow/jobs.py", line 198, in run
    self._execute()
  File "/usr/local/lib/python2.7/site-packages/airflow/jobs.py", line 1549, in _execute
    self._execute_helper(processor_manager)
  File "/usr/local/lib/python2.7/site-packages/airflow/jobs.py", line 1594, in _execute_helper
    self.reset_state_for_orphaned_tasks(session=session)
  File "/usr/local/lib/python2.7/site-packages/airflow/utils/db.py", line 50, in wrapper
    result = func(*args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/airflow/jobs.py", line 266, in reset_state_for_orphaned_tasks
    .filter(or_(*filter_for_tis), TI.state.in_(resettable_states))
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2783, in all
    return list(self)
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2935, in __iter__
    return self._execute_and_instances(context)
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2958, in _execute_and_instances
    result = conn.execute(querycontext.statement, self._params)
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 948, in execute
    return meth(self, multiparams, params)
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 269, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1060, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1200, in _execute_context
    context)
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1413, in _handle_dbapi_exception
    exc_info
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
    context)
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 508, in do_execute
    cursor.execute(statement, parameters)
  File "/usr/local/lib/python2.7/site-packages/MySQLdb/cursors.py", line 250, in execute
    self.errorhandler(self, exc, value)
  File "/usr/local/lib/python2.7/site-packages/MySQLdb/connections.py", line 50, in defaulterrorhandler
    raise errorvalue
sqlalchemy.exc.OperationalError: (_mysql_exceptions.OperationalError) (1205, 'Lock wait timeout exceeded; try restarting transaction') [SQL: u'SELECT task_instance.try_number AS task_instance_try_number, task_instance.task_id AS task_instance_task_id, task_instance.dag_id AS task_instance_dag_id, task_instance.execution_date AS task_instance_execution_date, task_instance.start_date AS task_instance_start_date, task_instance.end_date AS task_instance_end_date, task_instance.duration AS task_instance_duration, task_instance.state AS task_instance_state, task_instance.max_tries AS task_instance_max_tries, task_instance.hostname AS task_instance_hostname, task_instance.unixname AS task_instance_unixname, task_instance.job_id AS task_instance_job_id, task_instance.pool AS task_instance_pool, task_instance.queue AS task_instance_queue, task_instance.priority_weight AS task_instance_priority_weight, task_instance.operator AS task_instance_operator, task_instance.queued_dttm AS task_instance_queued_dttm, task_instance.pid AS task_instance_pid \nFROM task_instance \nWHERE (task_instance.dag_id = %s AND task_instance.task_id = %s AND task_instance.execution_date = %s OR task_instance.dag_id = %s AND task_instance.task_id = %s AND task_instance.execution_date = %s OR task_instance.dag_id = %s AND task_instance.task_id = %s AND task_instance.execution_date = %s OR task_instance.dag_id = %s AND task_instance.task_id = %s AND task_instance.execution_date = %s OR task_instance.dag_id = %s AND task_instance.task_id = %s AND task_instance.execution_date = %s OR task_instance.dag_id = %s AND task_instance.task_id = %s AND task_instance.execution_date = %s) AND task_instance.state IN (%s, %s) FOR UPDATE'] [parameters: ('pb_write_event_tables_v2_dev2', 'check_table_chest_progressed', datetime.datetime(2018, 6, 26, 8, 0), 'pb_write_event_tables_v2_dev2', 'check_table_name_changed', datetime.datetime(2018, 6, 26, 8, 0), 'pb_write_event_tables_v2_dev2', 'check_table_registered', datetime.datetime(2018, 6, 26, 8, 0), 'pb_write_event_tables_v2_dev2', 'check_table_unit_leveled_up', datetime.datetime(2018, 6, 26, 8, 0), 'pb_write_event_tables_v2_dev2', 'check_table_virtual_currency_earned', datetime.datetime(2018, 6, 26, 8, 0), 'pb_write_event_tables_v2_dev2', 'check_table_virtual_currency_spent', datetime.datetime(2018, 6, 26, 8, 0), u'scheduled', u'queued')] (Background on this error at: http://sqlalche.me/e/e3q8)

Я не в себе из-за технических ошибок СУБД. Тем не менее, это стандартный Google Composer со средой по умолчанию, поэтому мне интересно, если у кого-то еще была подобная проблема или есть какое-то представление о том, что происходит? Я понял, что Composer использует Google Cloud SQL для БД и, по-видимому (?) Бэкэнд MySQL.

Изображение планировщика воздушного потока: gcr.io/cloud-airflow-releaser/airflow-worker-scheduler-1.9.0:cloud_composer_service_2018-06-19-RC3.

Я должен добавить, что я не сталкивался с этой проблемой планировщика с самодельной настройкой Airflow Kubernetes, но затем я использовал новейшую версию Airflow с PostgreSQL.

...