Я относительно новичок в Python и в настоящее время пытаюсь использовать CUDA в определенной нейронной сети: Edge-Conditioned Convolution on Graphs, код можно найти здесь https://github.com/mys007/ecc
Я знаю, что есть несколько вопросов, подобных моему, но я не смог решить мою проблему с ними.
Я хотел бы обучить набор данных с помощью CUDA, но процесс останавливается во время обучения (случайной) эпохи со следующей ошибкой:
File "./main.py", line 317, in <module>
main()
File "./main.py", line 219, in main
acc_train, loss, t_loader, t_trainer = train(epoch)
File "./main.py", line 150, in train
outputs = model(inputs)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
result = self.forward(*input, **kwargs)
File "/workspace/ECC_Test/models.py", line 105, in forward
input = module(input)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
result = self.forward(*input, **kwargs)
File "/workspace/ECC_Test/ecc/GraphConvModule.py", line 173, in forward
return GraphConvFunction(self._in_channels, self._out_channels, idxn, idxe, degs, degs_gpu, self._edge_mem_limit)(input, weights)
File "/workspace/ECC_Test/ecc/GraphConvModule.py", line 69, in forward
cuda_kernels.conv_aggregate_fw(output.narrow(0,startd,numd), products.view(-1,self._out_channels), self._degs_gpu.narrow(0,startd,numd))
File "/workspace/ECC_Test/ecc/cuda_kernels.py", line 122, in conv_aggregate_fw
block=(CUDA_NUM_THREADS,1,1), grid=(GET_BLOCKS(w),n//blockDimY+1,1), stream=stream)
File "cupy/cuda/function.pyx", line 148, in cupy.cuda.function.Function.__call__
File "cupy/cuda/function.pyx", line 130, in cupy.cuda.function._launch
File "cupy/cuda/driver.pyx", line 228, in cupy.cuda.driver.launchKernel
File "cupy/cuda/driver.pyx", line 81, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Traceback (most recent call last):
File "cupy/cuda/driver.pyx", line 192, in cupy.cuda.driver.moduleUnload
File "cupy/cuda/driver.pyx", line 81, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
File "cupy/cuda/driver.pyx", line 192, in cupy.cuda.driver.moduleUnload
File "cupy/cuda/driver.pyx", line 81, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Traceback (most recent call last):
File "cupy/cuda/driver.pyx", line 192, in cupy.cuda.driver.moduleUnload
File "cupy/cuda/driver.pyx", line 81, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
File "cupy/cuda/driver.pyx", line 192, in cupy.cuda.driver.moduleUnload
File "cupy/cuda/driver.pyx", line 81, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Traceback (most recent call last):
File "cupy/cuda/driver.pyx", line 192, in cupy.cuda.driver.moduleUnload
File "cupy/cuda/driver.pyx", line 81, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
File "cupy/cuda/driver.pyx", line 192, in cupy.cuda.driver.moduleUnload
File "cupy/cuda/driver.pyx", line 81, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Traceback (most recent call last):
File "cupy/cuda/driver.pyx", line 192, in cupy.cuda.driver.moduleUnload
File "cupy/cuda/driver.pyx", line 81, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
File "cupy/cuda/driver.pyx", line 192, in cupy.cuda.driver.moduleUnload
File "cupy/cuda/driver.pyx", line 81, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Трассировка выполняется с помощью «CUDA_LAUNCH_BLOCKING = 1»
Переключение на процессор и деактивация CUDA работает нормально.
Я использую SSH для доступа к Серверу с 4 Nvidia Tesla V100 32GB с версией драйвера 410.104. CUDA 10.1 и Python 3.6.8 установлены.
В настоящее время Pytorch 1.1. Вызывает ли проблема более высокая версия PyTorch в сочетании с CUDA 10.1? Или мне не хватает памяти на GPU?