Ошибка выполнения: ошибка cuDNN: CUDNN_STATUS_INTERNAL_ERROR - PullRequest
0 голосов
/ 29 марта 2020

У меня ошибка при выполнении моего файла. Конфигурации моей системы перечислены ниже.

ОС: CentOS Linux 7
PyTorch 1.1.0
Версия TensorFlow: 1.2.0
Python версия: 3.6.8
Версия CUDA / cuDNN: 8.0 / 7.0.5
GPU: GPU Nvidia GeForce GTX 1080

/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda [](int)->auto::operator()(int)->auto: block: [1426,0,0], thread: [96,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
...
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda [](int)->auto::operator()(int)->auto: block: [1426,0,0], thread: [122,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda [](int)->auto::operator()(int)->auto: block: [1426,0,0], thread: [123,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda [](int)->auto::operator()(int)->auto: block: [1426,0,0], thread: [124,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda [](int)->auto::operator()(int)->auto: block: [1426,0,0], thread: [125,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda [](int)->auto::operator()(int)->auto: block: [1426,0,0], thread: [126,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda [](int)->auto::operator()(int)->auto: block: [1426,0,0], thread: [127,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
Traceback (most recent call last):
  File "./train.py", line 115, in <module>
    trainer.train()
  File "../../tasks/semantic/modules/trainer.py", line 239, in train
    show_scans=self.ARCH["train"]["show_scans"])
  File "../../tasks/semantic/modules/trainer.py", line 320, in train_epoch
    output = model(in_vol, proj_mask)
  File "/home/media-server/.pyenv/versions/anaconda3-5.0.0/envs/rangenet++/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "../../tasks/semantic/modules/segmentator.py", line 149, in forward
    y, skips = self.backbone(x)
  File "/home/media-server/.pyenv/versions/anaconda3-5.0.0/envs/rangenet++/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "../..//backbones/darknet.py", line 171, in forward
    x, skips, os = self.run_layer(x, self.conv1, skips, os)
  File "../..//backbones/darknet.py", line 154, in run_layer
    y = layer(x)
  File "/home/media-server/.pyenv/versions/anaconda3-5.0.0/envs/rangenet++/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/media-server/.pyenv/versions/anaconda3-5.0.0/envs/rangenet++/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 338, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR

Я попытался ввести sudo rm -rf ~/.nv и перезагрузиться, но это не сработало. Ошибка времени выполнения: CUDNN_STATUS_INTERNAL_ERROR

...