Я пытаюсь собрать Tensorflow из исходного кода с поддержкой GPU для конкретной версии CUDA и CuDNN.Поскольку я знаю, что конкретные версии, которые я хочу, доступны в nvidia-cuda-toolkit
Debian stretch-backports
, я создал следующий Dockerfile, основанный на официальном Dockerfile Tensorflow.
FROM debian:stretch
RUN echo 'deb http://deb.debian.org/debian stretch contrib non-free' >> /etc/apt/sources.list
RUN echo 'deb http://deb.debian.org/debian stretch-backports main contrib non-free' >> /etc/apt/sources.list
RUN apt-get update && apt-get install -y \
nano \
build-essential \
curl \
git \
unzip \
zip \
libcurl3-dev \
pkg-config \
python-dev \
software-properties-common \
python3 \
python3-dev \
python3-pip
RUN apt-get install -t stretch-backports -y \
nvidia-cuda-dev \
nvidia-cuda-toolkit \
&& \
rm -rf /var/lib/apt/lists/*
# Bazel
# Running bazel inside a `docker build` command causes trouble, cf:
# https://github.com/bazelbuild/bazel/issues/134
# The easiest solution is to set up a bazelrc file forcing --batch.
RUN echo "startup --batch" >>/etc/bazel.bazelrc
# Similarly, we need to workaround sandboxing issues:
# https://github.com/bazelbuild/bazel/issues/418
RUN echo "build --spawn_strategy=standalone --genrule_strategy=standalone" \
>>/etc/bazel.bazelrc
ENV BAZEL_VERSION 0.16.0
WORKDIR /
RUN mkdir /bazel && \
cd /bazel && \
curl -H "User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36" -fSsL -O https://github.com/bazelbuild/bazel/releases/download/$BAZEL_VERSION/bazel-$BAZEL_VERSION-installer-linux-x86_64.sh && \
curl -H "User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36" -fSsL -o /bazel/LICENSE.txt https://raw.githubusercontent.com/bazelbuild/bazel/master/LICENSE && \
chmod +x bazel-*.sh && \
./bazel-$BAZEL_VERSION-installer-linux-x86_64.sh && \
cd / && \
rm -f /bazel/bazel-$BAZEL_VERSION-installer-linux-x86_64.sh
# Tensorflow
WORKDIR /tensorflow
RUN git clone --branch=r1.8 --depth=1 https://github.com/tensorflow/tensorflow.git .
# Configure the build for our CUDA configuration.
ENV TF_NEED_GCP=0 \
TF_NEED_HDFS=0 \
TF_NEED_OPENCL=0 \
TF_NEED_JEMALLOC=0 \
TF_ENABLE_XLA=0 \
TF_NEED_VERBS=0 \
TF_CUDA_CLANG=0 \
TF_DOWNLOAD_CLANG=0 \
TF_NEED_MKL=0 \
TF_DOWNLOAD_MKL=0 \
TF_NEED_MPI=0 \
TF_NEED_S3=0 \
TF_NEED_KAFKA=0 \
TF_NEED_GDR=0 \
TF_NEED_OPENCL_SYCL=0 \
TF_SET_ANDROID_WORKSPACE=0 \
TF_NEED_AWS=0 \
CI_BUILD_PYTHON=python3 \
LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu/:$LD_LIBRARY_PATH \
TF_NEED_TENSORRT=1 \
TF_NEED_CUDA=1 \
TF_CUDA_COMPUTE_CAPABILITIES=3.5,5.2,6.0,6.1,7.0 \
TF_CUDA_VERSION=9.1 \
TF_CUDNN_VERSION=7 \
CUDA_TOOLKIT_PATH=/usr/lib/x86_64-linux-gnu/ \
CUDNN_INSTALL_PATH=/usr/lib/x86_64-linux-gnu/
RUN LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu/:${LD_LIBRARY_PATH} \
tensorflow/tools/ci_build/builds/configured GPU \
bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.1 --copt=-msse4.2 --config=cuda \
--cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" \
tensorflow/tools/pip_package:build_pip_package && \
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/pip
Моя главная проблема - к концу.Хотя я установил LD_LIBRARY_PATH
, CUDA_TOOLKIT_PATH
и CUDNN_INSTALL_PATH
на /usr/lib/x86_64-linux-gnu/
, Базель игнорирует его, возвращая Invalid path to CUDA 9.1 toolkit. /usr/lib/x86_64-linux-gnu/lib64/libcudart.so.9.1 cannot be found
.Есть дополнительный /lib64/
, который я не знаю, откуда он взялся.Для полной ошибки см. Ниже.
Вводя в образ докера, я вижу, что libcudart.so.9.1
находится в данном пути /usr/lib/x86_64-linux-gnu/
, но я не уверен, почему Базел настаивает на добавлении /lib64/
вперед ним.
Как можно принудительно указать путь, чтобы он прекратил добавление дополнительной lib64?
Заранее спасибо.
Выдержка из установки Bazel:
You have bazel 0.16.0 installed.
Found possible Python library paths:
/usr/local/lib/python3.5/dist-packages
/usr/lib/python3/dist-packages
Please input the desired Python library path to use. Default is [/usr/local/lib/python3.5/dist-packages]
Invalid path to CUDA 9.1 toolkit. /usr/lib/x86_64-linux-gnu/lib64/libcudart.so.9.1 cannot be found
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]:
Please specify the location where CUDA 9.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Invalid path to CUDA 9.0 toolkit. /usr/local/cuda/lib64/libcudart.so.9.0 cannot be found
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]:
Please specify the location where CUDA 9.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Invalid path to CUDA 9.0 toolkit. /usr/local/cuda/lib64/libcudart.so.9.0 cannot be found
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]:
Please specify the location where CUDA 9.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Invalid path to CUDA 9.0 toolkit. /usr/local/cuda/lib64/libcudart.so.9.0 cannot be found
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]:
Please specify the location where CUDA 9.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Invalid path to CUDA 9.0 toolkit. /usr/local/cuda/lib64/libcudart.so.9.0 cannot be found
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]:
Please specify the location where CUDA 9.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Invalid path to CUDA 9.0 toolkit. /usr/local/cuda/lib64/libcudart.so.9.0 cannot be found
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]:
Please specify the location where CUDA 9.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Invalid path to CUDA 9.0 toolkit. /usr/local/cuda/lib64/libcudart.so.9.0 cannot be found
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]:
Please specify the location where CUDA 9.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Invalid path to CUDA 9.0 toolkit. /usr/local/cuda/lib64/libcudart.so.9.0 cannot be found
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]:
Please specify the location where CUDA 9.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Invalid path to CUDA 9.0 toolkit. /usr/local/cuda/lib64/libcudart.so.9.0 cannot be found
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]:
Please specify the location where CUDA 9.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Invalid path to CUDA 9.0 toolkit. /usr/local/cuda/lib64/libcudart.so.9.0 cannot be found
Traceback (most recent call last):
File "configure.py", line 1580, in <module>
main()
File "configure.py", line 1515, in main
set_tf_cuda_version(environ_cp)
File "configure.py", line 895, in set_tf_cuda_version
_DEFAULT_PROMPT_ASK_ATTEMPTS)
__main__.UserInputError: Invalid TF_CUDA_SETTING setting was provided 10 times in a row. Assuming to be a scripting mistake.
The command '/bin/sh -c LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:${LD_LIBRARY_PATH} tensorflow/tools/ci_build/builds/configured GPU bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.1 --copt=-msse4.2 --config=cuda --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" tensorflow/tools/pip_package:build_pip_package && bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/pip' returned a non-zero code: 1