Дополнительная библиотека lib64 добавлена ​​в CUDA_TOOLKIT_PATH при построении тензорного потока с поддержкой графического процессора - PullRequest
0 голосов
/ 22 января 2019

Я пытаюсь собрать Tensorflow из исходного кода с поддержкой GPU для конкретной версии CUDA и CuDNN.Поскольку я знаю, что конкретные версии, которые я хочу, доступны в nvidia-cuda-toolkit Debian stretch-backports, я создал следующий Dockerfile, основанный на официальном Dockerfile Tensorflow.

FROM debian:stretch

RUN echo 'deb http://deb.debian.org/debian stretch contrib non-free' >> /etc/apt/sources.list
RUN echo 'deb http://deb.debian.org/debian stretch-backports main contrib non-free' >> /etc/apt/sources.list

RUN apt-get update && apt-get install -y \
        nano \
        build-essential \
        curl \
        git \
        unzip \
        zip \
        libcurl3-dev \
        pkg-config \
        python-dev \
        software-properties-common \
        python3 \
        python3-dev \
        python3-pip

RUN apt-get install -t stretch-backports -y \
        nvidia-cuda-dev \
        nvidia-cuda-toolkit \
        && \
    rm -rf /var/lib/apt/lists/*


# Bazel

# Running bazel inside a `docker build` command causes trouble, cf:
#   https://github.com/bazelbuild/bazel/issues/134
# The easiest solution is to set up a bazelrc file forcing --batch.
RUN echo "startup --batch" >>/etc/bazel.bazelrc
# Similarly, we need to workaround sandboxing issues:
#   https://github.com/bazelbuild/bazel/issues/418
RUN echo "build --spawn_strategy=standalone --genrule_strategy=standalone" \
    >>/etc/bazel.bazelrc

ENV BAZEL_VERSION 0.16.0
WORKDIR /
RUN mkdir /bazel && \
    cd /bazel && \
    curl -H "User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36" -fSsL -O https://github.com/bazelbuild/bazel/releases/download/$BAZEL_VERSION/bazel-$BAZEL_VERSION-installer-linux-x86_64.sh && \
    curl -H "User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36" -fSsL -o /bazel/LICENSE.txt https://raw.githubusercontent.com/bazelbuild/bazel/master/LICENSE && \
    chmod +x bazel-*.sh && \
    ./bazel-$BAZEL_VERSION-installer-linux-x86_64.sh && \
    cd / && \
    rm -f /bazel/bazel-$BAZEL_VERSION-installer-linux-x86_64.sh

# Tensorflow
WORKDIR /tensorflow
RUN git clone --branch=r1.8 --depth=1 https://github.com/tensorflow/tensorflow.git .

# Configure the build for our CUDA configuration.

ENV TF_NEED_GCP=0 \
    TF_NEED_HDFS=0 \
    TF_NEED_OPENCL=0 \
    TF_NEED_JEMALLOC=0 \
    TF_ENABLE_XLA=0 \
    TF_NEED_VERBS=0 \
    TF_CUDA_CLANG=0 \
    TF_DOWNLOAD_CLANG=0 \
    TF_NEED_MKL=0 \
    TF_DOWNLOAD_MKL=0 \
    TF_NEED_MPI=0 \
    TF_NEED_S3=0 \
    TF_NEED_KAFKA=0 \
    TF_NEED_GDR=0 \
    TF_NEED_OPENCL_SYCL=0 \
    TF_SET_ANDROID_WORKSPACE=0 \
    TF_NEED_AWS=0 \
    CI_BUILD_PYTHON=python3 \
    LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu/:$LD_LIBRARY_PATH \
    TF_NEED_TENSORRT=1 \
    TF_NEED_CUDA=1 \
    TF_CUDA_COMPUTE_CAPABILITIES=3.5,5.2,6.0,6.1,7.0 \
    TF_CUDA_VERSION=9.1 \
    TF_CUDNN_VERSION=7 \
    CUDA_TOOLKIT_PATH=/usr/lib/x86_64-linux-gnu/ \
    CUDNN_INSTALL_PATH=/usr/lib/x86_64-linux-gnu/

RUN LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu/:${LD_LIBRARY_PATH} \
    tensorflow/tools/ci_build/builds/configured GPU \
    bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.1 --copt=-msse4.2 --config=cuda \
    --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" \
        tensorflow/tools/pip_package:build_pip_package && \
    bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/pip

Моя главная проблема - к концу.Хотя я установил LD_LIBRARY_PATH, CUDA_TOOLKIT_PATH и CUDNN_INSTALL_PATH на /usr/lib/x86_64-linux-gnu/, Базель игнорирует его, возвращая Invalid path to CUDA 9.1 toolkit. /usr/lib/x86_64-linux-gnu/lib64/libcudart.so.9.1 cannot be found.Есть дополнительный /lib64/, который я не знаю, откуда он взялся.Для полной ошибки см. Ниже.

Вводя в образ докера, я вижу, что libcudart.so.9.1 находится в данном пути /usr/lib/x86_64-linux-gnu/, но я не уверен, почему Базел настаивает на добавлении /lib64/ вперед ним.

Как можно принудительно указать путь, чтобы он прекратил добавление дополнительной lib64?

Заранее спасибо.

Выдержка из установки Bazel:

You have bazel 0.16.0 installed.
Found possible Python library paths:
  /usr/local/lib/python3.5/dist-packages
  /usr/lib/python3/dist-packages
Please input the desired Python library path to use.  Default is [/usr/local/lib/python3.5/dist-packages]
Invalid path to CUDA 9.1 toolkit. /usr/lib/x86_64-linux-gnu/lib64/libcudart.so.9.1 cannot be found
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]: 

Please specify the location where CUDA 9.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 

Invalid path to CUDA 9.0 toolkit. /usr/local/cuda/lib64/libcudart.so.9.0 cannot be found
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]: 

Please specify the location where CUDA 9.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 

Invalid path to CUDA 9.0 toolkit. /usr/local/cuda/lib64/libcudart.so.9.0 cannot be found
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]: 

Please specify the location where CUDA 9.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 

Invalid path to CUDA 9.0 toolkit. /usr/local/cuda/lib64/libcudart.so.9.0 cannot be found
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]: 

Please specify the location where CUDA 9.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 

Invalid path to CUDA 9.0 toolkit. /usr/local/cuda/lib64/libcudart.so.9.0 cannot be found
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]: 

Please specify the location where CUDA 9.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 

Invalid path to CUDA 9.0 toolkit. /usr/local/cuda/lib64/libcudart.so.9.0 cannot be found
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]: 

Please specify the location where CUDA 9.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 

Invalid path to CUDA 9.0 toolkit. /usr/local/cuda/lib64/libcudart.so.9.0 cannot be found
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]: 

Please specify the location where CUDA 9.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 

Invalid path to CUDA 9.0 toolkit. /usr/local/cuda/lib64/libcudart.so.9.0 cannot be found
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]: 

Please specify the location where CUDA 9.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 

Invalid path to CUDA 9.0 toolkit. /usr/local/cuda/lib64/libcudart.so.9.0 cannot be found
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]: 

Please specify the location where CUDA 9.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 

Invalid path to CUDA 9.0 toolkit. /usr/local/cuda/lib64/libcudart.so.9.0 cannot be found
Traceback (most recent call last):
  File "configure.py", line 1580, in <module>
    main()
  File "configure.py", line 1515, in main
    set_tf_cuda_version(environ_cp)
  File "configure.py", line 895, in set_tf_cuda_version
    _DEFAULT_PROMPT_ASK_ATTEMPTS)
__main__.UserInputError: Invalid TF_CUDA_SETTING setting was provided 10 times in a row. Assuming to be a scripting mistake.
The command '/bin/sh -c LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:${LD_LIBRARY_PATH}     tensorflow/tools/ci_build/builds/configured GPU     bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.1 --copt=-msse4.2 --config=cuda  --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0"         tensorflow/tools/pip_package:build_pip_package &&     bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/pip' returned a non-zero code: 1
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...