Ошибка выполнения Cuda cudaErrorNoDevice: не обнаружено устройство с поддержкой CUDA - PullRequest
0 голосов
/ 18 марта 2019

Я использую Chainer, Cupy для CUDA 8.0 . Я пытаюсь обучить модели машинного обучения, используя скрипт python3.5 , но я получил эту ошибку:

cupy.cuda.runtime.CUDARuntimeError: cudaErrorNoDevice: no CUDA-capable 

Что я могу сделать, чтобы решить эту проблему?

Информация о среде для машины, на которой я пытался обучить свою модель глубокого обучения, которая дает подробную информацию о nvidi-smi, эхо CUDA_PATH, эхо LD_LIBRARY_PATH:

root@awsml04:~# nvidia-smi
Thu Mar 21 10:37:19 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.130                Driver Version: 384.130                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:00:1E.0 Off |                    0 |
| N/A   38C    P0    24W / 300W |      0MiB / 16152MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Проверьте путь CUDA

root@awsml04:~# echo $CUDA_PATH
/usr/local/cuda/bin:/usr/local/cuda-9.0

Проверьте LD_LIBRARY_PATH:

root@awsml04:~# echo $LD_LIBRARY_PATH
/usr/local/cuda/lib64{LD_LIBRARY_PATH:+:/usr/local/cuda-9.0/lib64:/usr/local/cuda/lib64{LD_LIBRARY_PATH:+:/usr/lib64/openmpi/lib/:/usr/local/cuda/lib64:/usr/local/lib:/usr/lib:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/mpi/lib:/lib/:}}

проверьте env | grep CUDA Path:

root@awsml04:~# env | grep CUDA
CUDA_PATH=/usr/local/cuda/bin:
LD_LIBRARY_PATH_WITH_DEFAULT_CUDA=/usr/lib64/openmpi/lib/:/usr/local/cuda/lib64:/usr/local/lib:/usr/lib:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/mpi/lib:/lib/:/usr/local/cuda-9.0/lib/:
LD_LIBRARY_PATH_WITHOUT_CUDA=/usr/lib64/openmpi/lib/:/usr/local/lib:/usr/lib:/usr/local/mpi/lib:/lib/:

Проверьте путь Python3

root@awsml04:~# which python3
/usr/bin/python3

Проверьте путь в пунктах

root@awsml04:~# which pip3
/usr/bin/pip3

проверка установленных библиотек python с указанием версии:

root@awsml04:~# pip3 freeze
absl-py==0.7.1
alabaster==0.7.12
alembic==1.0.8
appdirs==1.4.3
APScheduler==3.5.3
astor==0.7.1
astroid==2.1.0
awscli==1.16.76
Babel==2.6.0
backcall==0.1.0
beautifulsoup4==4.4.1
bleach==1.5.0
blinker==1.3
bokeh==1.0.3
boto==2.49.0
boto3==1.9.72
botocore==1.12.72
certifi==2018.11.29
chainer==5.3.0
chainerui==0.3.0
chardet==3.0.4
Click==7.0
cloud-init==18.5
cloudpickle==0.6.1
colorama==0.3.9
command-not-found==0.3
configobj==5.0.6
cpplint==1.3.0
cryptography==1.2.3
cycler==0.10.0
dask==1.0.0
decorator==4.3.0
defer==1.0.6
defusedxml==0.5.0
docutils==0.14
easydict==1.9
entrypoints==0.2.3
enum34==1.1.6
environment-kernels==1.1.1
fastrlock==0.4
filelock==2.0.13
Flask==1.0.2
future==0.17.1
gast==0.2.2
glog==0.3.1
graphviz==0.10.1
grpcio==1.19.0
h5py==2.7.1
hibagent==1.0.1
html5lib==0.9999999
idna==2.8
imagesize==1.1.0
ipykernel==5.1.0
ipyparallel==6.2.3
ipython==7.2.0
ipython-genutils==0.2.0
ipywidgets==7.4.2
isort==4.3.4
itsdangerous==1.1.0
jedi==0.13.2
Jinja2==2.10
jmespath==0.9.3
jsonpatch==1.10
jsonpointer==1.9
jsonschema==2.6.0
jupyter==1.0.0
jupyter-client==5.2.4
jupyter-console==6.0.0
jupyter-core==4.4.0
Keras==2.2.4
Keras-Applications==1.0.7
Keras-Preprocessing==1.0.9
kiwisolver==1.0.1
language-selector==0.1
lazy-object-proxy==1.3.1
lxml==3.5.0
Mako==1.0.7
Markdown==2.6.10
MarkupSafe==1.1.0
matplotlib==3.0.2
mccabe==0.6.1
mistune==0.8.4
mock==2.0.0
msgpack==0.6.1
nbconvert==5.4.0
nbformat==4.4.0
networkx==2.2
nose==1.3.7
notebook==5.7.4
numpy==1.15.1
oauthlib==1.0.3
olefile==0.44
opencv-python==3.4.1.15
packaging==18.0
pandas==0.23.4
pandocfilters==1.4.2
parso==0.3.1
pbr==5.1.3
pexpect==4.6.0
pickleshare==0.7.5
Pillow==4.3.0
prettytable==0.7.2
prometheus-client==0.5.0
prompt-toolkit==2.0.7
protobuf==3.7.0
ptyprocess==0.6.0
pyasn1==0.4.5
pycups==1.9.73
pycurl==7.43.0
pydot==1.4.1
pygal==2.4.0
Pygments==2.3.1
pygobject==3.20.0
PyJWT==1.3.0
pylint==2.2.2
pyparsing==2.2.0
pyserial==3.0.1
python-apt==1.1.0b1+ubuntu0.16.4.2
python-dateutil==2.6.1
python-debian==0.1.27
python-editor==1.0.4
python-gflags==3.1.2
python-systemd==231
pytz==2017.3
PyWavelets==1.0.1
pyxdg==0.25
PyYAML==3.13
pyzmq==17.1.2
qtconsole==4.4.3
requests==2.21.0
roman==2.0.0
rsa==3.4.2
s3transfer==0.1.13
scikit-image==0.14.1
scikit-learn==0.20.2
scipy==1.2.0
screen-resolution-extra==0.0.0
seaborn==0.9.0
Send2Trash==1.5.0
six==1.12.0
snowballstemmer==1.2.1
Sphinx==1.8.3
sphinx-rtd-theme==0.1.9
sphinxcontrib-websupport==1.1.0
SQLAlchemy==1.3.1
ssh-import-id==5.5
system-service==0.3
tensorboard==1.12.2
tensorflow==1.12.0
tensorflow-estimator==1.13.0
tensorflow-gpu==1.12.0
tensorflow-tensorboard==0.4.0rc3
termcolor==1.1.0
terminado==0.8.1
testpath==0.4.2
toolz==0.9.0
tornado==5.1.1
tqdm==4.19.5
traitlets==4.3.2
typed-ast==1.1.1
tzlocal==1.5.1
ufw==0.35
unattended-upgrades==0.1
urllib3==1.24.1
virtualenv==15.0.1
wcwidth==0.1.7
webencodings==0.5.1
Werkzeug==0.13
widgetsnbextension==3.4.2
wrapt==1.10.11
xkit==0.0.0

Информация о цепочке CUDA:

root@awsml04:~#  python3 -c "import chainer; print(chainer.print_runtime_info())"
/usr/lib/python3.5/site-packages/chainer/backends/cuda.py:98: UserWarning: cuDNN is not enabled.
Please reinstall CuPy after you install cudnn
(see https://docs-cupy.chainer.org/en/stable/install.html#install-cudnn).
  'cuDNN is not enabled.\n'
/usr/lib/python3.5/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Platform: Linux-4.4.0-1077-aws-x86_64-with-Ubuntu-16.04-xenial
Chainer: 5.3.0
NumPy: 1.15.1
CuPy:
  CuPy Version          : 5.3.0
  CUDA Root             : /usr/local/cuda/bin:/usr/local/cuda-9.0
  CUDA Build Version    : 9000
  CUDA Driver Version   : 9000
  CUDA Runtime Version  : 9000
  cuDNN Build Version   : None
  cuDNN Version         : None
  NCCL Build Version    : 2307
  NCCL Runtime Version  : 2307
iDeep: Not Available
None

root@awsml04:~# python3 -c "import cupy; print(cupy.empty((3, 3)))"
[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]


Полная трассировка ошибок:

stacktrace.py
Exception in main training loop: cudaErrorNoDevice: no CUDA-capable
device is detected Traceback (most recent call last):
  File "/root/.see-master/lib/python3.5/site-packages/chainer/training/trainer.py", line 302, in run
    entry.extension(self)
  File "/usr/lib/python3.5/contextlib.py", line 77, in __exit__
    self.gen.throw(type, value, traceback)
  File "/root/.see-master/lib/python3.5/site-packages/chainer/reporter.py", line 98, in scope
    yield
  File "/root/.see-master/lib/python3.5/site-packages/chainer/training/trainer.py", line 299, in run
    update()
  File "/root/.see-master/lib/python3.5/site-packages/chainer/training/updater.py", line 223, in update
    self.update_core()
  File "/root/.see-master/lib/python3.5/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 195, in update_core
    self.setup_workers()
  File "/root/.see-master/lib/python3.5/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 186, in setup_workers
    with cuda.Device(self._devices[0]):   File "cupy/cuda/device.pyx", line 106, in cupy.cuda.device.Device.__enter__
  File "cupy/cuda/runtime.pyx", line 164, in cupy.cuda.runtime.getDevice
  File "cupy/cuda/runtime.pyx", line 136, in
    cupy.cuda.runtime.check_status Will finalize trainer extensions and updater before reraising the exception.
Traceback (most recent call last):
  File "chainer/train_svhn.py", line 258, in <module>
    trainer.run()
  File "/root/.see-master/lib/python3.5/site-packages/chainer/training/trainer.py", line 313, in run
    six.reraise(*sys.exc_info())
  File "/usr/lib/python3.5/site-packages/six.py", line 693, in reraise
    raise value
  File "/root/.see-master/lib/python3.5/site-packages/chainer/training/trainer.py", line 302, in run
    entry.extension(self)
  File "/usr/lib/python3.5/contextlib.py", line 77, in __exit__
    self.gen.throw(type, value, traceback)
  File "/root/.see-master/lib/python3.5/site-packages/chainer/reporter.py", line 98, in scope
    yield
  File "/root/.see-master/lib/python3.5/site-packages/chainer/training/trainer.py", line 299, in run
    update()
  File "/root/.see-master/lib/python3.5/site-packages/chainer/training/updater.py", line 223, in update
    self.update_core()
  File "/root/.see-master/lib/python3.5/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 195, in update_core
    self.setup_workers()
  File "/root/.see-master/lib/python3.5/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 186, in setup_workers
    with cuda.Device(self._devices[0]):   File "cupy/cuda/device.pyx", line 106, in cupy.cuda.device.Device.__enter__
  File "cupy/cuda/runtime.pyx", line 164, in cupy.cuda.runtime.getDevice
  File "cupy/cuda/runtime.pyx", line 136, in cupy.cuda.runtime.check_status
    cupy.cuda.runtime.CUDARuntimeError: cudaErrorNoDevice: no CUDA-capable device is detected

1 Ответ

1 голос
/ 21 марта 2019

Недостаточно информации, чтобы угадать причину ошибки, но я просто предлагаю вам сделать что-то .

ВАЖНО: НЕ ВЫХОДИТ из системы, не отсоединяйте и не закрывайте оболочку, покавсе следующее.

$ export CUDA_PATH=/usr/local/cuda-9.0
$ export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64
$ pip3 uninstall -y chainer cupy cupy-cuda80 cupy-cuda90 cupy-cuda92
$ pip3 install cupy-cuda90 --no-cache-dir && pip3 install chainer --no-cache-dir
$ git clone https://github.com/chainer/chainer.git && cd chainer && git checkout v5.3.0
$ python3 examples/mnist/train_mnist.py --gpu 0

Если это работает, попробуйте снова запустить скрипт.

...