Можно ли поймать CUDA_ERROR_LAUNCH_TIMEOUT в коде Python с помощью кроме? - PullRequest
0 голосов
/ 30 июня 2019

Мне удалось скомпилировать CUDA-Tensorflow_Gpu на низкоуровневой машине (возможность вычисления CUDA составляет 3,0). Я запускаю файл модели на видео, но иногда возникают ошибки выделения памяти:

2019-06-30 17:11:15.525537: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.14GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-06-30 17:11:16.607461: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.11GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-06-30 17:11:18.926863: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.99GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-06-30 17:11:20.068998: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.32GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-06-30 17:11:25.334895: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.67GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-06-30 17:11:33.030001: E tensorflow/stream_executor/cuda/cuda_driver.cc:981] failed to synchronize the stop event: CUDA_ERROR_LAUNCH_TIMEOUT: the launch timed out and was terminated
2019-06-30 17:11:33.030044: E tensorflow/stream_executor/cuda/cuda_timer.cc:55] Internal: error destroying CUDA event in context 0x208cd570: CUDA_ERROR_LAUNCH_TIMEOUT: the launch timed out and was terminated
2019-06-30 17:11:33.030059: E tensorflow/stream_executor/cuda/cuda_timer.cc:60] Internal: error destroying CUDA event in context 0x208cd570: CUDA_ERROR_LAUNCH_TIMEOUT: the launch timed out and was terminated
2019-06-30 17:11:33.030101: F tensorflow/stream_executor/cuda/cuda_dnn.cc:231] Check failed: status == CUDNN_STATUS_SUCCESS (7 vs. 0)Failed to set cuDNN stream.
Aborted (core dumped)

Я попытался выполнить код в блоке try-Кроме, но не смог поймать ошибку. Я предполагаю, что этот вопрос о перехвате ошибки, возникающей в библиотеке C ++ в коде Python. Это возможно?

...