Сбой отправки задания с: CondaHTTPError: СБОЙ ПОДКЛЮЧЕНИЯ HTTP 000 - PullRequest
0 голосов
/ 23 сентября 2019

Я пытаюсь использовать rep https://github.com/microsoft/MLAKSDeployAML/ для развертывания службы AML с AKS.

Создал это на машине DSVM NC6_v2, и после того, как я пытался заставить работать conda вообще, я, наконец, настроил свою среду и начал работать с ноутбуками.

Я отправляю эксперимент, затем жду run.wait_for_completion (show_output = True), и он бомбит с ошибкой HTTP.Полный журнал управления прилагается ниже.

Возможно, это связано с тем, чтобы быть машиной с графическим процессором, или что-то еще происходит со службой?

Streaming log file azureml-logs/60_control_log.txt
Starting the daemon thread to refresh tokens in background for process with pid = 13317
nvidia-docker is installed on the target. Using nvidia-docker for docker operations.
Running: ['/bin/bash', '/tmp/azureml_runs/mlaks-train-on-local_1569245453_408a217b/azureml-environment-setup/docker_env_checker.sh']

Materialized image not found on target: azureml/azureml_473a6fe028e178fff5c9a8d49bc938f3


Logging experiment preparation status in history service.
Running: ['/bin/bash', '/tmp/azureml_runs/mlaks-train-on-local_1569245453_408a217b/azureml-environment-setup/docker_env_builder.sh']
Running: ['nvidia-docker', 'build', '-f', 'azureml-environment-setup/Dockerfile', '-t', 'azureml/azureml_473a6fe028e178fff5c9a8d49bc938f3', '.']
Sending build context to Docker daemon  410.1kB
Step 1/15 : FROM continuumio/miniconda3@sha256:54eb3dd4003f11f6a651b55fc2074a0ed6d9eeaa642f1c4c9a7cf8b148a30ceb
 ---> 4a51de2367be
Step 2/15 : USER root
 ---> Using cache
 ---> 42491a367cef
Step 3/15 : RUN mkdir -p $HOME/.cache
 ---> Using cache
 ---> 0771da9ffb76
Step 4/15 : WORKDIR /
 ---> Using cache
 ---> a8db57273ffb
Step 5/15 : COPY azureml-environment-setup/99brokenproxy /etc/apt/apt.conf.d/
 ---> Using cache
 ---> b2a669b740ca
Step 6/15 : RUN if dpkg --compare-versions `conda --version | grep -oE '[^ ]+$'` lt 4.4.11; then conda install conda==4.4.11; fi
 ---> Using cache
 ---> 1e430aeb68b0
Step 7/15 : COPY azureml-environment-setup/mutated_conda_dependencies.yml azureml-environment-setup/mutated_conda_dependencies.yml
 ---> Using cache
 ---> 0c6a9fafa84b
Step 8/15 : RUN ldconfig /usr/local/cuda/lib64/stubs && conda env create -p /azureml-envs/azureml_6303d702d8163bbfc0017533e979d4a3 -f azureml-environment-setup/mutated_conda_dependencies.yml && rm -rf "$HOME/.cache/pip" && conda clean -aqy && CONDA_ROOT_DIR=$(conda info --root) && rm -rf "$CONDA_ROOT_DIR/pkgs" && find "$CONDA_ROOT_DIR" -type d -name __pycache__ -exec rm -rf {} + && ldconfig
 ---> Running in a579672607b3
Warning: you have pip-installed dependencies in your environment file, but you do not list pip itself as one of your conda dependencies.  Conda may not use the correct pip to install your packages, and they may end up in the wrong place.  Please add an explicit pip dependency.  I'm adding one for you, but still nagging you.
Collecting package metadata (repodata.json): ...working... failed

CondaHTTPError: HTTP 000 CONNECTION FAILED for url <https://conda.anaconda.org/conda-forge/linux-64/repodata.json>
Elapsed: -

An HTTP error occurred when trying to retrieve this URL.
HTTP errors are often intermittent, and a simple retry will get you on your way.
ConnectionError(MaxRetryError("HTTPSConnectionPool(host='conda.anaconda.org', port=443): Max retries exceeded with url: /conda-forge/linux-64/repodata.json (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7fbb8c38cda0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution'))"))


The command '/bin/sh -c ldconfig /usr/local/cuda/lib64/stubs && conda env create -p /azureml-envs/azureml_6303d702d8163bbfc0017533e979d4a3 -f azureml-environment-setup/mutated_conda_dependencies.yml && rm -rf "$HOME/.cache/pip" && conda clean -aqy && CONDA_ROOT_DIR=$(conda info --root) && rm -rf "$CONDA_ROOT_DIR/pkgs" && find "$CONDA_ROOT_DIR" -type d -name __pycache__ -exec rm -rf {} + && ldconfig' returned a non-zero code: 1


CalledProcessError(1, ['nvidia-docker', 'build', '-f', 'azureml-environment-setup/Dockerfile', '-t', 'azureml/azureml_473a6fe028e178fff5c9a8d49bc938f3', '.'])

Building docker image failed with exit code: 1



Logging error in history service: Failed to run ['/bin/bash', '/tmp/azureml_runs/mlaks-train-on-local_1569245453_408a217b/azureml-environment-setup/docker_env_builder.sh'] 
 Exit code 1 
Details can be found in azureml-logs/60_control_log.txt log file.

Uploading control log...
Sending final run history status...
Logging experiment failed status in history service.
Control script execution completed

1 Ответ

0 голосов
/ 29 сентября 2019

Это временная проблема в сети.Пожалуйста, повторите

Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...