У меня работает виртуальная машина на сервере с Tesla P4.
У меня работает jenkins, и я хотел бы также использовать nvidia-docker.
При выполнении этой команды появляется следующее сообщение об ошибке:
15:07:36 + docker run --runtime=nvidia --rm tensorflow/tensorflow:latest-gpu python -c import tensorflow as tf; tf.enable_eager_execution(); print(tf.reduce_sum(tf.random_normal([1000, 1000])))
15:07:38 2018-11-12 14:07:38.940584: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
15:07:39 Traceback (most recent call last):
15:07:39 File "<string>", line 1, in <module>
15:07:39 File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/random_ops.py", line 73, in random_normal
15:07:39 shape_tensor = _ShapeTensor(shape)
15:07:39 File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/random_ops.py", line 44, in _ShapeTensor
15:07:39 return ops.convert_to_tensor(shape, dtype=dtype, name="shape")
15:07:39 File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1050, in convert_to_tensor
15:07:39 as_ref=False)
15:07:39 File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1146, in internal_convert_to_tensor
15:07:39 ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
15:07:39 File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/constant_op.py", line 229, in _constant_tensor_conversion_function
15:07:39 return constant(v, dtype=dtype, name=name)
15:07:39 File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/constant_op.py", line 179, in constant
15:07:39 t = convert_to_eager_tensor(value, ctx, dtype)
15:07:39 File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/constant_op.py", line 99, in convert_to_eager_tensor
15:07:39 handle = ctx._handle # pylint: disable=protected-access
15:07:39 File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/eager/context.py", line 319, in _handle
15:07:39 self._initialize_handle_and_devices()
15:07:39 File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/eager/context.py", line 267, in _initialize_handle_and_devices
15:07:39 self._context_handle = pywrap_tensorflow.TFE_NewContext(opts)
15:07:39 tensorflow.python.framework.errors_impl.InternalError: failed initializing StreamExecutor for CUDA device ordinal 0: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_NOT_SUPPORTED: operation not supported
Понятия не имею, где искать и что делать, видеокарта должна быть в наличии.
15:16:20 + docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
15:16:22 Mon Nov 12 14:16:22 2018
15:16:22 +-----------------------------------------------------------------------------+
15:16:22 | NVIDIA-SMI 410.73 Driver Version: 410.73 CUDA Version: 10.0 |
15:16:22 |-------------------------------+----------------------+----------------------+
15:16:22 | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
15:16:22 | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
15:16:22 |===============================+======================+======================|
15:16:22 | 0 Tesla P4 Off | 00000000:00:06.0 Off | N/A |
15:16:22 | N/A 47C P0 24W / 75W | 0MiB / 7611MiB | 0% Default |
15:16:22 +-------------------------------+----------------------+----------------------+
15:16:22
15:16:22 +-----------------------------------------------------------------------------+
15:16:22 | Processes: GPU Memory |
15:16:22 | GPU PID Type Process name Usage |
15:16:22 |=============================================================================|
15:16:22 | No running processes found |
15:16:22 +-----------------------------------------------------------------------------+
Что может вызвать эту проблему?