Ошибка keras при попытке получить вывод промежуточного слоя: не удалось создать дескриптор cudnn - PullRequest
1 голос
/ 12 апреля 2020

Я строю модель, используя керасы. Я использую:

  1. Анаконда (python 3,7)
  2. TenSenflow-GPU (2.1)
  3. Керас (2.3.1)
  4. cuda (10.1.2)
  5. cudnn (7.6.5)
  6. драйвер NVIDIA (445,7)
  7. NVIDIA GPU: GTX 1660Ti (6 ГБ)

когда я пытаюсь запустить модель, появляется код, который выдает ошибку:

def get_gen_output(gan, noise):
    intermediate_model=Model(inputs=gan.input,outputs=gan.layers[24].output)
    layer_output = intermediate_model.predict(noise)
    return layer_output[0]

эта модель - CNN gan. Я могу хорошо запустить другие модели CNN, только эта модель создает проблему. я получаю ошибку:

Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED
BaseCollectiveExecutor::StartAbort Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.

из других вопросов, которые сталкиваются с той же проблемой, я вижу, что есть две общие причины, которые могут вызвать ее:

  1. недостаточно Память GPU - но я не думаю, что это проблема, поскольку даже если я создаю очень маленькую модель, которая включает фрагмент кода сверху, появляется ошибка. и большие модели без этого кода работают хорошо.

  2. проблема с совместимостью cuda и cudnn - но исходя из этой ссылки , версия, которую я перечислил выше, должна работать.

Есть идеи, в чем может быть проблема и как ее исправить? Я пытался решить это уже несколько дней. если потребуется дополнительная информация (например, краткое описание модели), пожалуйста, дайте мне знать в комментариях, и я добавлю ее.

ОБНОВЛЕНИЕ: комментарий попросил меня опубликовать журналы:

(base) C:\Users\Moran>ju[yter notebook
'ju[yter' is not recognized as an internal or external command,
operable program or batch file.

(base) C:\Users\Moran>jupyter notebook
[I 16:42:41.966 NotebookApp] Serving notebooks from local directory: C:\Users\Moran
[I 16:42:41.967 NotebookApp] The Jupyter Notebook is running at:
[I 16:42:41.967 NotebookApp] http://localhost:8888/?token=ec3a664897f7d31597f7f4544609cc8c0d7b4db7450b55b1
[I 16:42:41.967 NotebookApp]  or http://127.0.0.1:8888/?token=ec3a664897f7d31597f7f4544609cc8c0d7b4db7450b55b1
[I 16:42:41.967 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 16:42:42.000 NotebookApp]

    To access the notebook, open this file in a browser:
        file:///C:/Users/Moran/AppData/Roaming/jupyter/runtime/nbserver-15820-open.html
    Or copy and paste one of these URLs:
        http://localhost:8888/?token=ec3a664897f7d31597f7f4544609cc8c0d7b4db7450b55b1
     or http://127.0.0.1:8888/?token=ec3a664897f7d31597f7f4544609cc8c0d7b4db7450b55b1
[I 16:42:47.284 NotebookApp] Kernel started: ae448b14-33fc-471e-a2ae-991be8321434
[W 16:42:47.740 NotebookApp] 404 GET /api/kernels/4ce83e1e-9aa5-4c93-97d8-55dc16480242/channels?session_id=eaa90dc2c0bb4c448d6a01d66f4fbb21 (127.0.0.1): Kernel does not exist: 4ce83e1e-9aa5-4c93-97d8-55dc16480242
[W 16:42:47.757 NotebookApp] 404 GET /api/kernels/4ce83e1e-9aa5-4c93-97d8-55dc16480242/channels?session_id=eaa90dc2c0bb4c448d6a01d66f4fbb21 (127.0.0.1) 18.94ms referer=None
[W 16:42:49.439 NotebookApp] 404 GET /api/kernels/b9e9b610-9c5b-4565-8b85-deb70837c31f/channels?session_id=34072dd627c74e96b496ef73d99601a9 (::1): Kernel does not exist: b9e9b610-9c5b-4565-8b85-deb70837c31f
[W 16:42:49.440 NotebookApp] 404 GET /api/kernels/b9e9b610-9c5b-4565-8b85-deb70837c31f/channels?session_id=34072dd627c74e96b496ef73d99601a9 (::1) 2.00ms referer=None
2020-04-12 16:43:00.321827: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-04-12 16:43:02.652473: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-04-12 16:43:02.685848: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1660 Ti computeCapability: 7.5
coreClock: 1.59GHz coreCount: 24 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 268.26GiB/s
2020-04-12 16:43:02.693105: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-04-12 16:43:02.700970: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-04-12 16:43:02.708335: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-04-12 16:43:02.713049: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-04-12 16:43:02.720598: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-04-12 16:43:02.726428: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-04-12 16:43:02.738007: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-12 16:43:02.741940: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-04-12 16:43:02.745942: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2020-04-12 16:43:02.754621: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1660 Ti computeCapability: 7.5
coreClock: 1.59GHz coreCount: 24 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 268.26GiB/s
2020-04-12 16:43:02.761464: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-04-12 16:43:02.766394: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-04-12 16:43:02.770257: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-04-12 16:43:02.773975: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-04-12 16:43:02.777827: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-04-12 16:43:02.782949: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-04-12 16:43:02.786952: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-12 16:43:02.791207: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-04-12 16:43:03.372450: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-12 16:43:03.376375: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0
2020-04-12 16:43:03.379436: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N
2020-04-12 16:43:03.382400: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/device:GPU:0 with 4625 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2020-04-12 16:43:03.966022: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1660 Ti computeCapability: 7.5
coreClock: 1.59GHz coreCount: 24 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 268.26GiB/s
2020-04-12 16:43:03.976011: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-04-12 16:43:03.980766: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-04-12 16:43:03.985179: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-04-12 16:43:03.988922: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-04-12 16:43:03.992744: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-04-12 16:43:03.997758: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-04-12 16:43:04.001856: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-12 16:43:04.006936: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-04-12 16:43:04.009739: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-12 16:43:04.014702: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0
2020-04-12 16:43:04.017351: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N
2020-04-12 16:43:04.020371: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4625 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
[W 16:43:04.449 NotebookApp] Replacing stale connection: 4ce83e1e-9aa5-4c93-97d8-55dc16480242:eaa90dc2c0bb4c448d6a01d66f4fbb21
2020-04-12 16:43:05.280820: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-12 16:43:06.518456: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED
2020-04-12 16:43:06.522375: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED
2020-04-12 16:43:06.525103: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
         [[{{node 1/convolution}}]]
[W 16:43:06.741 NotebookApp] Replacing stale connection: b9e9b610-9c5b-4565-8b85-deb70837c31f:34072dd627c74e96b496ef73d99601a9
[I 16:43:08.454 NotebookApp] Saving file at /generative models/GAN.ipynb

1 Ответ

1 голос
/ 17 апреля 2020

Пожалуйста, удалите nvidia cuda toolkit из среды anaconda, а также из системы.

sudo apt-get remove nvidia-cuda-toolkit

conda remove cudatoolkit

И, используйте следующую опцию при вызове сеанса tenorflow

Tensorflow

import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)

Для кератов:

from keras.backend.tensorflow_backend import set_session
import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True 
sess = tf.Session(config=config)
set_session(sess)  # set this TensorFlow session as the default session for Keras
...