Тензор потока докера Nvidia завершается с ошибкой CUDA_ERROR_NOT_SUPPORTED: операция не поддерживается - PullRequest
0 голосов
/ 12 ноября 2018

У меня работает виртуальная машина на сервере с Tesla P4. У меня работает jenkins, и я хотел бы также использовать nvidia-docker.

При выполнении этой команды появляется следующее сообщение об ошибке:

15:07:36 + docker run --runtime=nvidia --rm tensorflow/tensorflow:latest-gpu python -c import tensorflow as tf; tf.enable_eager_execution(); print(tf.reduce_sum(tf.random_normal([1000, 1000])))


15:07:38 2018-11-12 14:07:38.940584: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
15:07:39 Traceback (most recent call last):
15:07:39   File "<string>", line 1, in <module>
15:07:39   File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/random_ops.py", line 73, in random_normal
15:07:39     shape_tensor = _ShapeTensor(shape)
15:07:39   File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/random_ops.py", line 44, in _ShapeTensor
15:07:39     return ops.convert_to_tensor(shape, dtype=dtype, name="shape")
15:07:39   File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1050, in convert_to_tensor
15:07:39     as_ref=False)
15:07:39   File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1146, in internal_convert_to_tensor
15:07:39     ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
15:07:39   File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/constant_op.py", line 229, in _constant_tensor_conversion_function
15:07:39     return constant(v, dtype=dtype, name=name)
15:07:39   File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/constant_op.py", line 179, in constant
15:07:39     t = convert_to_eager_tensor(value, ctx, dtype)
15:07:39   File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/constant_op.py", line 99, in convert_to_eager_tensor
15:07:39     handle = ctx._handle  # pylint: disable=protected-access
15:07:39   File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/eager/context.py", line 319, in _handle
15:07:39     self._initialize_handle_and_devices()
15:07:39   File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/eager/context.py", line 267, in _initialize_handle_and_devices
15:07:39     self._context_handle = pywrap_tensorflow.TFE_NewContext(opts)
15:07:39 tensorflow.python.framework.errors_impl.InternalError: failed initializing StreamExecutor for CUDA device ordinal 0: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_NOT_SUPPORTED: operation not supported

Понятия не имею, где искать и что делать, видеокарта должна быть в наличии.

15:16:20 + docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
15:16:22 Mon Nov 12 14:16:22 2018       
15:16:22 +-----------------------------------------------------------------------------+
15:16:22 | NVIDIA-SMI 410.73       Driver Version: 410.73       CUDA Version: 10.0     |
15:16:22 |-------------------------------+----------------------+----------------------+
15:16:22 | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
15:16:22 | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
15:16:22 |===============================+======================+======================|
15:16:22 |   0  Tesla P4            Off  | 00000000:00:06.0 Off |                  N/A |
15:16:22 | N/A   47C    P0    24W /  75W |      0MiB /  7611MiB |      0%      Default |
15:16:22 +-------------------------------+----------------------+----------------------+
15:16:22                                                                                
15:16:22 +-----------------------------------------------------------------------------+
15:16:22 | Processes:                                                       GPU Memory |
15:16:22 |  GPU       PID   Type   Process name                             Usage      |
15:16:22 |=============================================================================|
15:16:22 |  No running processes found                                                 |
15:16:22 +-----------------------------------------------------------------------------+

Что может вызвать эту проблему?

...