Рэй не запускает рабочих на EC2 - PullRequest
0 голосов
/ 13 апреля 2019

Я использую модуль Ray для запуска кластера Ubuntu (16.04) на AWS EC2. В конфигурации я указал min_workers, max_workers и initial_workers как 2, потому что мне не нужно никакого автоматического изменения размера. Мне также нужен мастер-узел t2.micro и рабочие c4.8xarge. Кластер запускается, но только мастер (следующий вывод терминала от установки луча, .... минус детали): -

2019-04-18 14:52:48,462 INFO updater.py:268 -- NodeUpdater: Running pip3 install -U https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-0.7.0.dev2-cp35-cp35m-manylinux1_x86_64.whl on 54.226.178.23...
bash: cannot set terminal process group (-1): Inappropriate ioctl for device
bash: no job control in this shell
Collecting ray==0.7.0.dev2 from https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-0.7.0.dev2-cp35-cp35m-manylinux1_x86_64.whl
Downloading https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-0.7.0.dev2-cp35-cp35m-manylinux1_x86_64.whl (56.2MB)
.....
.....
Successfully built pyyaml
Installing collected packages: click, colorama, six, redis, typing, filelock, flatbuffers, numpy, pyyaml, more-itertools, setuptools, attrs, atomicwrites, pluggy, py, pathlib2, pytest, funcsigs, ray
Successfully installed atomicwrites attrs click colorama filelock flatbuffers funcsigs more-itertools numpy pathlib2 pluggy py pytest pyyaml-3.11 ray redis setuptools-20.7.0 six-1.10.0 typing
You are using pip version 8.1.1, however version 19.0.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
2019-04-18 14:53:32,656 INFO updater.py:268 -- NodeUpdater: Running pip3    install boto3==1.4.8 on 54.226.178.23...
bash: cannot set terminal process group (-1): Inappropriate ioctl for device
bash: no job control in this shell
Collecting boto3==1.4.8
Downloading https://files.pythonhosted.org/packages/7d/09/66fef826fb13a2cee74a1df56c269d2794a90ece49c3b77113b733e4b91d/boto3-1.4.8-
....
....
Installing collected packages: docutils, jmespath, six, python-dateutil, botocore, s3transfer, boto3
Successfully installed boto3-1.4.8 botocore-1.8.50 docutils-0.14 jmespath-0.9.4 python-dateutil-2.8.0 s3transfer-0.1.13 six-1.12.0
You are using pip version 8.1.1, however version 19.0.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
2019-04-18 14:53:37,805 INFO updater.py:268 -- NodeUpdater: Running ray stop on 54.226.178.23...
bash: cannot set terminal process group (-1): Inappropriate ioctl for device
bash: no job control in this shell
WARNING: Not monitoring node memory since `psutil` is not installed.  Install this with `pip install psutil` (or ray[debug]) to enable debugging of memory-related crashes.
2019-04-18 14:53:39,775 INFO updater.py:268 -- NodeUpdater: Running ulimit -n 65536; ray start --head --redis-port=6379 --object-manager-port=8076 --autoscaling-config=~/ray_bootstrap_config.yaml on 54.226.178.23...
bash: cannot set terminal process group (-1): Inappropriate ioctl for device
bash: no job control in this shell
2019-04-18 18:53:40,167 INFO scripts.py:288 -- Using IP address 172.31.7.117 for this node.
2019-04-18 18:53:40,167 INFO node.py:469 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2019-04-18_18-53-40_7981/logs.
2019-04-18 18:53:40,271 INFO services.py:407 -- Waiting for redis server at 127.0.0.1:6379 to respond...
2019-04-18 18:53:40,389 INFO services.py:407 -- Waiting for redis server at 127.0.0.1:60491 to respond...
2019-04-18 18:53:40,390 INFO services.py:804 -- Starting Redis shard with 0.21 GB max memory.
2019-04-18 18:53:40,400 INFO node.py:483 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2019-04-18_18-53-40_7981/logs.
2019-04-18 18:53:40,410 INFO services.py:1439 -- Starting the Plasma object store with 0.31 GB memory using /dev/shm.
2019-04-18 18:53:40,421 WARNING services.py:907 -- Failed to start the reporter. The reporter requires 'pip install psutil'.
WARNING: Not monitoring node memory since `psutil` is not installed. Install this with `pip install psutil` (or ray[debug]) to enable debugging of memory-related crashes.
2019-04-18 18:53:40,425 INFO scripts.py:319 -- 
Started Ray on this node. You can add additional nodes to the cluster by calling

    ray start --redis-address 172.31.7.117:6379

from the node you wish to add. You can connect a driver to the cluster from Python by running

import ray
ray.init(redis_address="172.31.7.117:6379")

If you have trouble connecting from a different machine, check that your firewall is configured properly. If you wish to terminate the processes that have been started, run

ray stop
2019-04-18 14:53:40,593 INFO log_timer.py:21 -- NodeUpdater: i-064f62badf69f8cee: Setup commands completed [LogTimer=115941ms]
2019-04-18 14:53:40,593 INFO log_timer.py:21 -- NodeUpdater: i-064f62badf69f8cee: Applied config 248f16e493ac5bcd753a673eb7202fa2b49e0f9f  [LogTimer=173814ms]
2019-04-18 14:53:40,973 INFO log_timer.py:21 -- AWSNodeProvider: Set tag ray-node-status=up-to-date on ['i-064f62badf69f8cee'] [LogTimer=374ms]
2019-04-18 14:53:41,069 INFO commands.py:264 -- get_or_create_head_node:  Head node up-to-date, IP address is: 54.226.178.23
To monitor auto-scaling activity, you can run:

  ray exec ray_config.yaml  'tail -n 100 -f /tmp/ray/session_*/logs/monitor*'

To open a console on the cluster:

  ray attach ray_config.yaml

To ssh manually to the cluster, run:

  ssh -i /home/haines/.ssh/ray-autoscaler_us-east-1.pem ubuntu@54.226.178.23

2019-04-18 14:53:41,181 INFO log_timer.py:21 -- AWSNodeProvider: Set tag ray-runtime-config=248f16e493ac5bcd753a673eb7202fa2b49e0f9f on ['i-064f62badf69f8cee'] 

Я использовал стандартную конфигурацию (example-full.yaml) со следующими изменениями: -

min_workers: 2

initial_workers: 2

    type: aws
    region: us-east-1
    availability_zone: us-east1a,us-east-1b


head_node:
    InstanceType: t2.micro
    ImageId: ami-0565af6e282977273 # ubuntu/images/hvm-ssd/ubuntu-xenial-16.04-amd64-server-20190212

worker_nodes:
    InstanceType: c4.8xlarge
    ImageId: ami-0f9cf087c1f27d9b1 # ubuntu/images/hvm-ssd/ubuntu-xenial-16.04-amd64-server-20181114  

        #MarketType: spot

setup_commands:

- echo 'export PATH="$HOME/anaconda3/envs/tensorflow_p36/bin:$PATH"' >>     ~/.bashrc
    - sudo apt-get update
    - sudo apt-get install python3-pip
    - pip3 install -U https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-0.7.0.dev2-cp35-cp35m-manylinux1_x86_64.whl

    - pip3 install boto3==1.4.8  # 1.4.8 adds InstanceMarketOptions

Последняя неудачная настройка: -

setup_commands:
- sudo apt-get update
- wget https://repo.continuum.io/archive/Anaconda3-5.0.1-Linux-x86_64.sh || true 1>/dev/null
- bash Anaconda3-5.0.1-Linux-x86_64.sh -b -p $HOME/anaconda3 || true 1>/dev/null
- echo 'export PATH="$HOME/anaconda3/bin:$PATH"' >> ~/.bashrc
- sudo pkill -9 apt-get || true
- sudo pkill -9 dpkg || true
- sudo dpkg --configure -a
- sudo apt-get install python3-pip || true
- pip3 install --upgrade pip
- pip3 install --user psutil
- pip3 install --user proctitle
- pip3 install --user ray
- pip3 install --user boto3==1.4.8
- pip3 install --user https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-0.7.0.dev2-cp35-cp35m-manylinux1_x86_64.whl

1 Ответ

0 голосов
/ 20 апреля 2019

Я запустил слегка измененную версию конфига, которую вы разместили, и это работает для меня

cluster_name: test

min_workers: 2

initial_workers: 2

provider:
    type: aws
    region: us-east-1
    availability_zone: us-east1a,us-east-1b

head_node:
    InstanceType: t2.micro
    ImageId: ami-0565af6e282977273 # ubuntu/images/hvm-ssd/ubuntu-xenial-16.04-amd64-server-20190212

worker_nodes:
    InstanceType: c4.8xlarge
    ImageId: ami-0f9cf087c1f27d9b1 # ubuntu/images/hvm-ssd/ubuntu-xenial-16.04-amd64-server-20181114
        #MarketType: spot

setup_commands:
    - sudo apt-get update
    # Install Anaconda.
    - wget https://repo.continuum.io/archive/Anaconda3-5.0.1-Linux-x86_64.sh || true
    - bash Anaconda3-5.0.1-Linux-x86_64.sh -b -p $HOME/anaconda3 || true
    - echo 'export PATH="$HOME/anaconda3/bin:$PATH"' >> ~/.bashrc
    # Install Ray.
    - pip install ray
    - pip install boto3==1.4.8  # 1.4.8 adds InstanceMarketOptions

Единственное реальное отличие, которое я думаю, это установка Anaconda Python и установка PATH так, чтобы pip находит это правильно.Я подозреваю, что проблема была связана с тем, что не удалось найти нужную версию Python.

...