Невозможно создать кластер Datapro c в облачной платформе Google "NodeInitializationAction должен указывать исполняемый файл" - PullRequest
2 голосов
/ 26 марта 2020

Ошибка ниже при создании Datapro c Кластер на Google Cloud Platform. Мы используем плагин Mercury для Airflow. Просто хочу понять, как в чем проблема. Я перепробовал много вариантов, но до сих пор не смог прийти ни к какому выводу.

[2020-03-26 06:04:14,612] {base_task_runner.py:95} INFO - Subtask: [2020-03-26 06:04:14,612] {models.py:1352} INFO - Executing <Task(GoogleCloudCreateDataprocCluster): create_medax_cluster> on 2020-03-26 06:03:40.963562
[2020-03-26 06:04:14,649] {base_task_runner.py:95} INFO - Subtask: [2020-03-26 06:04:14,649] {gcp_api_base_hook.py:73} INFO - Getting connection using `gcloud auth` user, since no key file is defined for hook.
[2020-03-26 06:04:14,655] {base_task_runner.py:95} INFO - Subtask: [2020-03-26 06:04:14,655] {discovery.py:267} INFO - URL being requested: GET https://www.googleapis.com/discovery/v1/apis/dataproc/v1beta2/rest
[2020-03-26 06:04:14,655] {base_task_runner.py:95} INFO - Subtask: [2020-03-26 06:04:14,655] {transport.py:157} INFO - Attempting refresh to obtain initial access_token
[2020-03-26 06:04:14,732] {base_task_runner.py:95} INFO - Subtask: [2020-03-26 06:04:14,731] {discovery.py:866} INFO - URL being requested: GET https://dataproc.googleapis.com/v1beta2/projects/dh-xxxxxxxx-xxxxxxxx-72410/regions/xxxxxxxx/clusters?alt=json
[2020-03-26 06:04:15,455] {base_task_runner.py:95} INFO - Subtask: File gs://xxxxxxxx-xxxxxxxx-dpl-artif/dataproc/dataproc-init.sh will not be executed on dataproc startup.
[2020-03-26 06:04:15,455] {base_task_runner.py:95} INFO - Subtask: [2020-03-26 06:04:15,455] {discovery.py:866} INFO - URL being requested: POST https://dataproc.googleapis.com/v1beta2/projects/dh-xxxxxxxx-xxxxxxxx-72410/regions/xxxxxxxx/clusters?alt=json
[2020-03-26 06:04:20,490] {base_task_runner.py:95} INFO - Subtask: [2020-03-26 06:04:20,490] {discovery.py:866} INFO - URL being requested: GET https://dataproc.googleapis.com/v1beta2/projects/dh-xxxxxxxx-xxxxxxxx-72410/regions/xxxxxxxx/clusters?alt=json
[2020-03-26 06:04:20,534] {base_task_runner.py:95} INFO - Subtask: [2020-03-26 06:04:20,533] {models.py:1427} ERROR - <HttpError 400 when requesting https://dataproc.googleapis.com/v1beta2/projects/dh-xxxxxxxx-xxxxxxxx-72410/regions/xxxxxxxx/clusters?alt=json returned "Multiple validation errors:
[2020-03-26 06:04:20,535] {base_task_runner.py:95} INFO - Subtask:  - NodeInitializationAction must specify executable
[2020-03-26 06:04:20,535] {base_task_runner.py:95} INFO - Subtask:  - Object URI '' is not a valid GCS URI">
[2020-03-26 06:04:20,535] {base_task_runner.py:95} INFO - Subtask: Traceback (most recent call last):
[2020-03-26 06:04:20,535] {base_task_runner.py:95} INFO - Subtask:   File "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line 1384, in run
[2020-03-26 06:04:20,535] {base_task_runner.py:95} INFO - Subtask:     result = task_copy.execute(context=context)
[2020-03-26 06:04:20,535] {base_task_runner.py:95} INFO - Subtask:   File "/usr/local/airflow/plugins/mercury_plugins.py", line 1516, in execute
[2020-03-26 06:04:20,535] {base_task_runner.py:95} INFO - Subtask:     raise e
[2020-03-26 06:04:20,536] {base_task_runner.py:95} INFO - Subtask: HttpError: <HttpError 400 when requesting https://dataproc.googleapis.com/v1beta2/projects/dh-xxxxxxxx-xxxxxxxx-72410/regions/xxxxxxxx/clusters?alt=json returned "Multiple validation errors:
[2020-03-26 06:04:20,536] {base_task_runner.py:95} INFO - Subtask:  - NodeInitializationAction must specify executable
[2020-03-26 06:04:20,536] {base_task_runner.py:95} INFO - Subtask:  - Object URI '' is not a valid GCS URI">
[2020-03-26 06:04:20,536] {base_task_runner.py:95} INFO - Subtask: [2020-03-26 06:04:20,534] {models.py:1451} INFO - Marking task as FAILED.
[2020-03-26 06:04:20,537] {base_task_runner.py:95} INFO - Subtask: [2020-03-26 06:04:20,537] {configuration.py:609} WARNING - section/key [smtp/smtp_user] not found in config
[2020-03-26 06:04:20,538] {base_task_runner.py:95} INFO - Subtask: [2020-03-26 06:04:20,538] {models.py:1466} ERROR - Failed at executing callback
[2020-03-26 06:04:20,539] {base_task_runner.py:95} INFO - Subtask: [2020-03-26 06:04:20,538] {models.py:1467} ERROR - [Errno 99] Cannot assign requested address
[2020-03-26 06:04:20,539] {base_task_runner.py:95} INFO - Subtask: Traceback (most recent call last):
[2020-03-26 06:04:20,539] {base_task_runner.py:95} INFO - Subtask:   File "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line 1464, in handle_failure
[2020-03-26 06:04:20,539] {base_task_runner.py:95} INFO - Subtask:     task.on_failure_callback(context)
[2020-03-26 06:04:20,539] {base_task_runner.py:95} INFO - Subtask:   File "/usr/local/airflow/dags/notification.py", line 35, in on_failure_callback
[2020-03-26 06:04:20,539] {base_task_runner.py:95} INFO - Subtask:     return operator.execute(context=context)
[2020-03-26 06:04:20,540] {base_task_runner.py:95} INFO - Subtask:   File "/usr/local/lib/python2.7/dist-packages/airflow/operators/email_operator.py", line 64, in execute
[2020-03-26 06:04:20,540] {base_task_runner.py:95} INFO - Subtask:     send_email(self.to, self.subject, self.html_content, files=self.files, cc=self.cc, bcc=self.bcc, mime_subtype=self.mime_subtype)
[2020-03-26 06:04:20,540] {base_task_runner.py:95} INFO - Subtask:   File "/usr/local/lib/python2.7/dist-packages/airflow/utils/email.py", line 44, in send_email
[2020-03-26 06:04:20,540] {base_task_runner.py:95} INFO - Subtask:     return backend(to, subject, html_content, files=files, dryrun=dryrun, cc=cc, bcc=bcc, mime_subtype=mime_subtype)
[2020-03-26 06:04:20,540] {base_task_runner.py:95} INFO - Subtask:   File "/usr/local/lib/python2.7/dist-packages/airflow/utils/email.py", line 86, in send_email_smtp
[2020-03-26 06:04:20,541] {base_task_runner.py:95} INFO - Subtask:     send_MIME_email(SMTP_MAIL_FROM, recipients, msg, dryrun)
[2020-03-26 06:04:20,541] {base_task_runner.py:95} INFO - Subtask:   File "/usr/local/lib/python2.7/dist-packages/airflow/utils/email.py", line 104, in send_MIME_email
[2020-03-26 06:04:20,541] {base_task_runner.py:95} INFO - Subtask:     s = smtplib.SMTP_SSL(SMTP_HOST, SMTP_PORT) if SMTP_SSL else smtplib.SMTP(SMTP_HOST, SMTP_PORT)
[2020-03-26 06:04:20,541] {base_task_runner.py:95} INFO - Subtask:   File "/usr/lib/python2.7/smtplib.py", line 256, in __init__
[2020-03-26 06:04:20,542] {base_task_runner.py:95} INFO - Subtask:     (code, msg) = self.connect(host, port)
[2020-03-26 06:04:20,542] {base_task_runner.py:95} INFO - Subtask:   File "/usr/lib/python2.7/smtplib.py", line 316, in connect
[2020-03-26 06:04:20,542] {base_task_runner.py:95} INFO - Subtask:     self.sock = self._get_socket(host, port, self.timeout)
[2020-03-26 06:04:20,542] {base_task_runner.py:95} INFO - Subtask:   File "/usr/lib/python2.7/smtplib.py", line 291, in _get_socket
[2020-03-26 06:04:20,542] {base_task_runner.py:95} INFO - Subtask:     return socket.create_connection((host, port), timeout)
[2020-03-26 06:04:20,543] {base_task_runner.py:95} INFO - Subtask:   File "/usr/lib/python2.7/socket.py", line 575, in create_connection
[2020-03-26 06:04:20,543] {base_task_runner.py:95} INFO - Subtask:     raise err
[2020-03-26 06:04:20,543] {base_task_runner.py:95} INFO - Subtask: error: [Errno 99] Cannot assign requested address
[2020-03-26 06:04:20,553] {base_task_runner.py:95} INFO - Subtask: [2020-03-26 06:04:20,553] {models.py:1472} ERROR - <HttpError 400 when requesting https://dataproc.googleapis.com/v1beta2/projects/dh-xxxxxxxx-xxxxxxxx-72410/regions/xxxxxxxx/clusters?alt=json returned "Multiple validation errors:
[2020-03-26 06:04:20,553] {base_task_runner.py:95} INFO - Subtask:  - NodeInitializationAction must specify executable
[2020-03-26 06:04:20,554] {base_task_runner.py:95} INFO - Subtask:  - Object URI '' is not a valid GCS URI">
[2020-03-26 06:04:20,554] {base_task_runner.py:95} INFO - Subtask: Traceback (most recent call last):
[2020-03-26 06:04:20,554] {base_task_runner.py:95} INFO - Subtask:   File "/usr/local/bin/airflow", line 28, in <module>
[2020-03-26 06:04:20,554] {base_task_runner.py:95} INFO - Subtask:     args.func(args)
[2020-03-26 06:04:20,554] {base_task_runner.py:95} INFO - Subtask:   File "/usr/local/lib/python2.7/dist-packages/airflow/bin/cli.py", line 422, in run
[2020-03-26 06:04:20,554] {base_task_runner.py:95} INFO - Subtask:     pool=args.pool,
[2020-03-26 06:04:20,555] {base_task_runner.py:95} INFO - Subtask:   File "/usr/local/lib/python2.7/dist-packages/airflow/utils/db.py", line 53, in wrapper
[2020-03-26 06:04:20,555] {base_task_runner.py:95} INFO - Subtask:     result = func(*args, **kwargs)
[2020-03-26 06:04:20,555] {base_task_runner.py:95} INFO - Subtask:   File "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line 1384, in run
[2020-03-26 06:04:20,555] {base_task_runner.py:95} INFO - Subtask:     result = task_copy.execute(context=context)
[2020-03-26 06:04:20,555] {base_task_runner.py:95} INFO - Subtask:   File "/usr/local/airflow/plugins/mercury_plugins.py", line 1516, in execute
[2020-03-26 06:04:20,555] {base_task_runner.py:95} INFO - Subtask:     raise e
[2020-03-26 06:04:20,556] {base_task_runner.py:95} INFO - Subtask: googleapiclient.errors.HttpError: <HttpError 400 when requesting https://dataproc.googleapis.com/v1beta2/projects/dh-xxxxxxxx-xxxxxxxx-72410/regions/xxxxxxxx/clusters?alt=json returned "Multiple validation errors:
[2020-03-26 06:04:20,556] {base_task_runner.py:95} INFO - Subtask:  - NodeInitializationAction must specify executable
[2020-03-26 06:04:20,556] {base_task_runner.py:95} INFO - Subtask:  - Object URI '' is not a valid GCS URI">
[2020-03-26 06:04:22,646] {jobs.py:2107} INFO - Task exited with return code 1

Я пытался изменить имя кластера. Но это не сработало.

1 Ответ

1 голос
/ 26 марта 2020

Проблема в том, что вы указали пустой путь для действия инициализации в рабочем процессе Airflow:

INFO - Subtask:  - Object URI '' is not a valid GCS URI">

Вот пример того, как вы должны это сделать, используя DataprocClusterCreateOperator в потоке воздуха:

DataprocClusterCreateOperator(
    # ...
    init_actions_uris = ['gs://<BUCKET>/path/to/init/action.sh'],
    # ...
)

Если вы не используете действия по инициализации, вам не следует устанавливать параметр init_actions_uris вообще, или вы должны установить его на None.

...