Проблема с настройкой луча с помощью Google Cloud - PullRequest
0 голосов
/ 29 октября 2019

Я пытаюсь настроить кластер Ray с использованием Kubernetes в соответствии с https://ray.readthedocs.io/en/latest/autoscaling.html#kubernetes. Вот мои шаги:

  1. Создание кластера Kubernetes в облачной платформе Google
  2. Подключение ккластер через облачную оболочку
  3. выполните следующие команды: sudo pip install -U ray, sudo pip install kubernetes
  4. run ray up (пример файла конфигурации)

Тогда меня спрашивают, нужно ли создавать кластер. Я отвечаю даОн продолжает выводить «ошибка с сервера (badrequest): pod ray-head-242dd не имеет назначенного хоста»

Затем я пробую подход https://ray.readthedocs.io/en/latest/autoscaling.html#gcp. Я изменяю название проекта в примерном полном yaml. Тогда я бегу Рэй до Ямл. Вот вывод:

   WARNING: Not monitoring node memory since `psutil` is not installed. Install this with `pip install psutil` (or ray[debug]) to enable debugging of memory-related crashes.
2019-10-28 17:06:58,254 WARNING __init__.py:44 -- file_cache is unavailable when using oauth2client >= 4.0.0 or google-auth
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/googleapiclient/discovery_cache/__init__.py", line 41, in autodetect
    from . import file_cache
  File "/usr/local/lib/python2.7/dist-packages/googleapiclient/discovery_cache/file_cache.py", line 41, in <module>
    'file_cache is unavailable when using oauth2client >= 4.0.0 or google-auth')
ImportError: file_cache is unavailable when using oauth2client >= 4.0.0 or google-auth
2019-10-28 17:06:58,258 INFO discovery.py:271 -- URL being requested: GET https://www.googleapis.com/discovery/v1/apis/cloudresourcemanager/v1/rest
2019-10-28 17:06:58,397 WARNING __init__.py:44 -- file_cache is unavailable when using oauth2client >= 4.0.0 or google-auth
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/googleapiclient/discovery_cache/__init__.py", line 41, in autodetect
    from . import file_cache
  File "/usr/local/lib/python2.7/dist-packages/googleapiclient/discovery_cache/file_cache.py", line 41, in <module>
    'file_cache is unavailable when using oauth2client >= 4.0.0 or google-auth')
ImportError: file_cache is unavailable when using oauth2client >= 4.0.0 or google-auth
2019-10-28 17:06:58,398 INFO discovery.py:271 -- URL being requested: GET https://www.googleapis.com/discovery/v1/apis/iam/v1/rest
2019-10-28 17:06:58,448 WARNING __init__.py:44 -- file_cache is unavailable when using oauth2client >= 4.0.0 or google-auth
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/googleapiclient/discovery_cache/__init__.py", line 41, in autodetect
    from . import file_cache
  File "/usr/local/lib/python2.7/dist-packages/googleapiclient/discovery_cache/file_cache.py", line 41, in <module>
    'file_cache is unavailable when using oauth2client >= 4.0.0 or google-auth')
ImportError: file_cache is unavailable when using oauth2client >= 4.0.0 or google-auth
2019-10-28 17:06:58,448 INFO discovery.py:271 -- URL being requested: GET https://www.googleapis.com/discovery/v1/apis/compute/v1/rest
2019-10-28 17:06:58,609 INFO discovery.py:867 -- URL being requested: GET https://cloudresourcemanager.googleapis.com/v1/projects/project?alt=json
2019-10-28 17:06:58,700 INFO discovery.py:867 -- URL being requested: GET https://iam.googleapis.com/v1/projects/project/serviceAccounts/ray-autoscaler-sa-v1@project.iam.gserviceaccount.com?alt=json
2019-10-28 17:06:58,764 INFO config.py:165 -- _configure_iam_role: Creating new service account ray-autoscaler-sa-v1
2019-10-28 17:06:58,772 INFO discovery.py:867 -- URL being requested: POST https://iam.googleapis.com/v1/projects/project/serviceAccounts?alt=json
2019-10-28 17:06:59,449 INFO discovery.py:867 -- URL being requested: POST https://cloudresourcemanager.googleapis.com/v1/projects/project:getIamPolicy?alt=json
2019-10-28 17:06:59,591 INFO discovery.py:867 -- URL being requested: POST https://cloudresourcemanager.googleapis.com/v1/projects/project:setIamPolicy?alt=json
2019-10-28 17:07:00,095 INFO discovery.py:867 -- URL being requested: GET https://compute.googleapis.com/compute/v1/projects/project?alt=json
2019-10-28 17:07:00,319 INFO config.py:238 -- _configure_key_pair: Creating new key pair ray-autoscaler_gcp_us-west1_project_ubuntu
2019-10-28 17:07:00,409 INFO discovery.py:867 -- URL being requested: POST https://compute.googleapis.com/compute/v1/projects/project/setCommonInstanceMetadata?alt=json
2019-10-28 17:07:01,025 INFO config.py:59 -- wait_for_compute_global_operation: Waiting for operation operation-1572296820417-595fee1766329-d528523f-5b1ebecc to finish...
2019-10-28 17:07:01,031 INFO discovery.py:867 -- URL being requested: GET https://compute.googleapis.com/compute/v1/projects/project/global/operations/operation-1572296820417-595fee1766329-d528523f-5b1ebecc?alt=json
2019-10-28 17:07:06,261 INFO discovery.py:867 -- URL being requested: GET https://compute.googleapis.com/compute/v1/projects/project/global/operations/operation-1572296820417-595fee1766329-d528523f-5b1ebecc?alt=json
2019-10-28 17:07:11,491 INFO discovery.py:867 -- URL being requested: GET https://compute.googleapis.com/compute/v1/projects/project/global/operations/operation-1572296820417-595fee1766329-d528523f-5b1ebecc?alt=json
2019-10-28 17:07:11,744 INFO config.py:70 -- wait_for_compute_global_operation: Operation done.
2019-10-28 17:07:11,745 INFO config.py:265 -- _configure_key_pair: Private key not specified in config, using/home/zh2408/.ssh/ray-autoscaler_gcp_us-west1_project_ubuntu.pem
2019-10-28 17:07:11,755 INFO discovery.py:867 -- URL being requested: GET https://compute.googleapis.com/compute/v1/projects/project/regions/us-west1/subnetworks?alt=json
2019-10-28 17:07:11,908 WARNING __init__.py:44 -- file_cache is unavailable when using oauth2client >= 4.0.0 or google-auth
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/googleapiclient/discovery_cache/__init__.py", line 41, in autodetect
    from . import file_cache
  File "/usr/local/lib/python2.7/dist-packages/googleapiclient/discovery_cache/file_cache.py", line 41, in <module>
    'file_cache is unavailable when using oauth2client >= 4.0.0 or google-auth')
ImportError: file_cache is unavailable when using oauth2client >= 4.0.0 or google-auth
2019-10-28 17:07:11,909 INFO discovery.py:271 -- URL being requested: GET https://www.googleapis.com/discovery/v1/apis/compute/v1/rest
2019-10-28 17:07:12,040 INFO discovery.py:867 -- URL being requested: GET https://compute.googleapis.com/compute/v1/projects/project/zones/us-west1-a/instances?filter=%28%28labels.ray-node-type+%3D+head%29%29+AND+%28%28status+%3D+RUNNING%29+OR+%28status+%3D+STAGING%29+OR+%28status+%3D+PROVISIONING%29%29+AND+%28labels.ray-cluster-name+%3D+default%29&alt=json
This will create a new cluster [y/N]: y
2019-10-28 17:07:17,457 INFO commands.py:201 -- get_or_create_head_node: Launching new head node...
2019-10-28 17:07:17,472 INFO discovery.py:867 -- URL being requested: POST https://compute.googleapis.com/compute/v1/projects/project/zones/us-west1-a/instances?alt=json
2019-10-28 17:07:19,474 INFO node_provider.py:26 -- wait_for_compute_zone_operation: Waiting for operation operation-1572296837479-595fee27abde7-e9b428db-4d0e22ec to finish...
2019-10-28 17:07:19,476 INFO discovery.py:867 -- URL being requested: GET https://compute.googleapis.com/compute/v1/projects/project/zones/us-west1-a/operations/operation-1572296837479-595fee27abde7-e9b428db-4d0e22ec?alt=json
2019-10-28 17:07:24,717 INFO discovery.py:867 -- URL being requested: GET https://compute.googleapis.com/compute/v1/projects/project/zones/us-west1-a/operations/operation-1572296837479-595fee27abde7-e9b428db-4d0e22ec?alt=json
2019-10-28 17:07:25,039 INFO node_provider.py:37 -- wait_for_compute_zone_operation: Operation operation-1572296837479-595fee27abde7-e9b428db-4d0e22ec finished.
2019-10-28 17:07:25,055 INFO discovery.py:867 -- URL being requested: GET https://compute.googleapis.com/compute/v1/projects/project/zones/us-west1-a/instances?filter=%28%28labels.ray-launch-config+%3D+07f3c1fd9b3e0be05984f720952adf2b99563d9d%29+AND+%28labels.ray-node-type+%3D+head%29+AND+%28labels.ray-node-name+%3D+ray-default-head%29%29+AND+%28%28status+%3D+RUNNING%29+OR+%28status+%3D+STAGING%29+OR+%28status+%3D+PROVISIONING%29%29+AND+%28labels.ray-cluster-name+%3D+default%29&alt=json
2019-10-28 17:07:25,802 INFO commands.py:214 -- get_or_create_head_node: Updating files on head node...
2019-10-28 17:07:25,806 INFO updater.py:356 -- NodeUpdater: ray-default-head-f3ed05cc: Updating to 2ae7e7f3db51902552832d843b3db964635184e5
2019-10-28 17:07:25,820 INFO discovery.py:867 -- URL being requested: GET https://compute.googleapis.com/compute/v1/projects/project/zones/us-west1-a/instances?filter=%28%28status+%3D+RUNNING%29+OR+%28status+%3D+STAGING%29+OR+%28status+%3D+PROVISIONING%29%29+AND+%28labels.ray-cluster-name+%3D+default%29&alt=json
2019-10-28 17:07:26,030 INFO discovery.py:867 -- URL being requested: POST https://compute.googleapis.com/compute/v1/projects/project/zones/us-west1-a/instances/ray-default-head-f3ed05cc/setLabels?alt=json
2019-10-28 17:07:26,766 INFO node_provider.py:26 -- wait_for_compute_zone_operation: Waiting for operation operation-1572296846037-595fee2fd53e7-f3e51edb-17229134 to finish...
2019-10-28 17:07:26,768 INFO discovery.py:867 -- URL being requested: GET https://compute.googleapis.com/compute/v1/projects/project/zones/us-west1-a/operations/operation-1572296846037-595fee2fd53e7-f3e51edb-17229134?alt=json
2019-10-28 17:07:32,033 INFO discovery.py:867 -- URL being requested: GET https://compute.googleapis.com/compute/v1/projects/project/zones/us-west1-a/operations/operation-1572296846037-595fee2fd53e7-f3e51edb-17229134?alt=json
2019-10-28 17:07:32,336 INFO node_provider.py:37 -- wait_for_compute_zone_operation: Operation operation-1572296846037-595fee2fd53e7-f3e51edb-17229134 finished.
2019-10-28 17:07:32,337 INFO updater.py:398 -- NodeUpdater: ray-default-head-f3ed05cc: Waiting for remote shell...
2019-10-28 17:07:32,337 INFO updater.py:210 -- NodeUpdater: ray-default-head-f3ed05cc: Waiting for IP...
2019-10-28 17:07:32,337 INFO log_timer.py:21 -- NodeUpdater: ray-default-head-f3ed05cc: Got IP [LogTimer=0ms]
2019-10-28 17:07:32,354 INFO updater.py:262 -- NodeUpdater: ray-default-head-f3ed05cc: Running uptime on 34.82.120.14...
ssh: connect to host 34.82.120.14 port 22: Connection refused
2019-10-28 17:07:38,502 INFO updater.py:262 -- NodeUpdater: ray-default-head-f3ed05cc: Running uptime on 34.82.120.14...
ssh: connect to host 34.82.120.14 port 22: Connection refused
2019-10-28 17:07:43,602 INFO updater.py:262 -- NodeUpdater: ray-default-head-f3ed05cc: Running uptime on 34.82.120.14...
ssh: connect to host 34.82.120.14 port 22: Connection refused
2019-10-28 17:07:48,686 INFO updater.py:262 -- NodeUpdater: ray-default-head-f3ed05cc: Running uptime on 34.82.120.14...
ssh: connect to host 34.82.120.14 port 22: Connection refused
2019-10-28 17:07:53,792 INFO updater.py:262 -- NodeUpdater: ray-default-head-f3ed05cc: Running uptime on 34.82.120.14...
ssh: connect to host 34.82.120.14 port 22: Connection refused
2019-10-28 17:07:58,878 INFO updater.py:262 -- NodeUpdater: ray-default-head-f3ed05cc: Running uptime on 34.82.120.14...
ssh: connect to host 34.82.120.14 port 22: Connection refused
2019-10-28 17:08:03,965 INFO updater.py:262 -- NodeUpdater: ray-default-head-f3ed05cc: Running uptime on 34.82.120.14...
ssh: connect to host 34.82.120.14 port 22: Connection refused
2019-10-28 17:08:09,053 INFO updater.py:262 -- NodeUpdater: ray-default-head-f3ed05cc: Running uptime on 34.82.120.14...
ssh: connect to host 34.82.120.14 port 22: Connection refused
2019-10-28 17:08:14,143 INFO updater.py:262 -- NodeUpdater: ray-default-head-f3ed05cc: Running uptime on 34.82.120.14...
Warning: Permanently added '34.82.120.14' (ECDSA) to the list of known hosts.
bash: cannot set terminal process group (-1): Inappropriate ioctl for device
bash: no job control in this shell
 21:08:15 up 0 min,  0 users,  load average: 1.10, 0.32, 0.11
2019-10-28 17:08:15,103 INFO log_timer.py:21 -- NodeUpdater: ray-default-head-f3ed05cc: Got remote shell [LogTimer=42766ms]
2019-10-28 17:08:15,129 INFO discovery.py:867 -- URL being requested: GET https://compute.googleapis.com/compute/v1/projects/project/zones/us-west1-a/instances?filter=%28%28status+%3D+RUNNING%29+OR+%28status+%3D+STAGING%29+OR+%28status+%3D+PROVISIONING%29%29+AND+%28labels.ray-cluster-name+%3D+default%29&alt=json
2019-10-28 17:08:15,348 INFO discovery.py:867 -- URL being requested: POST https://compute.googleapis.com/compute/v1/projects/project/zones/us-west1-a/instances/ray-default-head-f3ed05cc/setLabels?alt=json
2019-10-28 17:08:16,008 INFO node_provider.py:26 -- wait_for_compute_zone_operation: Waiting for operation operation-1572296895356-595fee5edde25-16887d46-c522d063 to finish...
2019-10-28 17:08:16,011 INFO discovery.py:867 -- URL being requested: GET https://compute.googleapis.com/compute/v1/projects/project/zones/us-west1-a/operations/operation-1572296895356-595fee5edde25-16887d46-c522d063?alt=json
2019-10-28 17:08:21,313 INFO discovery.py:867 -- URL being requested: GET https://compute.googleapis.com/compute/v1/projects/project/zones/us-west1-a/operations/operation-1572296895356-595fee5edde25-16887d46-c522d063?alt=json
2019-10-28 17:08:21,581 INFO node_provider.py:37 -- wait_for_compute_zone_operation: Operation operation-1572296895356-595fee5edde25-16887d46-c522d063 finished.
2019-10-28 17:08:21,582 INFO updater.py:262 -- NodeUpdater: ray-default-head-f3ed05cc: Running mkdir -p ~ on 34.82.120.14...
bash: cannot set terminal process group (-1): Inappropriate ioctl for device
bash: no job control in this shell
2019-10-28 17:08:21,741 INFO updater.py:460 -- NodeUpdater: ray-default-head-f3ed05cc: Syncing /tmp/ray-bootstrap-5XD_Sh to ~/ray_bootstrap_config.yaml...
2019-10-28 17:08:21,755 INFO log_timer.py:21 -- NodeUpdater: ray-default-head-f3ed05cc: Synced /tmp/ray-bootstrap-5XD_Sh to ~/ray_bootstrap_config.yaml [LogTimer=174ms]
2019-10-28 17:08:21,756 INFO log_timer.py:21 -- NodeUpdater: ray-default-head-f3ed05cc: Applied config 2ae7e7f3db51902552832d843b3db964635184e5 [LogTimer=55949ms]
2019-10-28 17:08:21,756 ERROR updater.py:367 -- NodeUpdater: ray-default-head-f3ed05cc: Error updating [Errno 2] No such file or directory
2019-10-28 17:08:21,770 INFO discovery.py:867 -- URL being requested: GET https://compute.googleapis.com/compute/v1/projects/project/zones/us-west1-a/instances?filter=%28%28status+%3D+RUNNING%29+OR+%28status+%3D+STAGING%29+OR+%28status+%3D+PROVISIONING%29%29+AND+%28labels.ray-cluster-name+%3D+default%29&alt=json
2019-10-28 17:08:22,006 INFO discovery.py:867 -- URL being requested: POST https://compute.googleapis.com/compute/v1/projects/project/zones/us-west1-a/instances/ray-default-head-f3ed05cc/setLabels?alt=json
2019-10-28 17:08:22,649 INFO node_provider.py:26 -- wait_for_compute_zone_operation: Waiting for operation operation-1572296902019-595fee65389b8-c0cc26c3-1813a77e to finish...
2019-10-28 17:08:22,651 INFO discovery.py:867 -- URL being requested: GET https://compute.googleapis.com/compute/v1/projects/project/zones/us-west1-a/operations/operation-1572296902019-595fee65389b8-c0cc26c3-1813a77e?alt=json
2019-10-28 17:08:27,936 INFO discovery.py:867 -- URL being requested: GET https://compute.googleapis.com/compute/v1/projects/project/zones/us-west1-a/operations/operation-1572296902019-595fee65389b8-c0cc26c3-1813a77e?alt=json
2019-10-28 17:08:28,180 INFO node_provider.py:37 -- wait_for_compute_zone_operation: Operation operation-1572296902019-595fee65389b8-c0cc26c3-1813a77e finished.
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/local/lib/python2.7/dist-packages/ray/autoscaler/updater.py", line 370, in run
    raise e
OSError: [Errno 2] No such file or directory
2019-10-28 17:08:28,214 INFO discovery.py:867 -- URL being requested: GET https://compute.googleapis.com/compute/v1/projects/project/zones/us-west1-a/instances?filter=%28%28labels.ray-launch-config+%3D+07f3c1fd9b3e0be05984f720952adf2b99563d9d%29+AND+%28labels.ray-node-type+%3D+head%29+AND+%28labels.ray-node-name+%3D+ray-default-head%29%29+AND+%28%28status+%3D+RUNNING%29+OR+%28status+%3D+STAGING%29+OR+%28status+%3D+PROVISIONING%29%29+AND+%28labels.ray-cluster-name+%3D+default%29&alt=json
2019-10-28 17:08:28,431 ERROR commands.py:277 -- get_or_create_head_node: Updating 34.82.120.14 failed

Я только нахожу, что экземпляр Ray VM был создан. Я понятия не имею, что означают ошибки и как настроить кластер лучей через облако Google.

1 Ответ

0 голосов
/ 30 октября 2019

Сообщение об ошибке, относящееся к хосту:

error from server (badrequest): pod ray-head-242dd does not have a host assigned

Означает, что модуль не был запланирован в узле.

Согласно документация, указанная в вашем вопросе , этот пример Ray должен работать на машине 2-vCPU (n1-standard-2).

Предоставленный ray / python / ray/autoscaler/gcp/example-full.yaml Файл конфигурации кластера создаст небольшой кластер с головным узлом n1-standard-2

Определение Pod дает request для 1 vCPU. Тем не менее, он ожидает машину с большим количеством vCPU, учитывая, что другие процессы / модули / ресурсы выполняются в том же узле, и он не может выделить все это работающему модулю.

Вы можете попробовать снова установитьдругой тип машины для вашего пула узлов .

В примечании можно проверить причину сбоя модуля, введя следующую команду:

$ kubectl describe pod { YOUR - RAY - POD - NAME }

Это намекает вам на причину проблем, таких как предотвращенное планирование.

...