Tf 2: не удалось создать дескриптор cudnn: CUDNN_STATUS_INTERNAL_ERROR - PullRequest
0 голосов
/ 04 апреля 2020

Я получаю вышеуказанную ошибку (не удалось создать дескриптор cudnn: CUDNN_STATUS_INTERNAL_ERROR) при выполнении кода ниже. Я проверял, работает ли мой графический процессор с помощью tf.test.is_gpu_available

# coding: utf-8

import tensorflow as tf
import numpy as np
import keras
from models import *
import os 
import gc 

TF_FORCE_GPU_ALLOW_GROWTH = True

np.random.seed(1000)
#Paths
MODEL_CONF = "../models/conf/"
MODEL_WEIGHTS = "../models/weights/"
#Model informations
N_CLASSES = 3


def load_array(name):
    return np.load(name, allow_pickle = True)


gc.collect()

dirData = "saved_data/"
trainDir = dirData + "train/"

model = AdaptedLeNet((168, 168, 8), N_CLASSES)
model.summary(print_fn=lambda x: print(x + '\n'))

# Compile the model with the specified loss function.
model.compile(optimizer=keras.optimizers.Adam(),
            loss='categorical_crossentropy',
            metrics=['accuracy'])

for filename in os.listdir(trainDir):
    data = load_array(trainDir + filename)

    train = data["a"]
    labels = data["b"].astype(int).reshape(-1) 
    one_hot_targets = np.eye(N_CLASSES)[labels]

    model.fit(x=train, y=one_hot_targets, batch_size=32, epochs=5)

    gc.collect()

Вывод этого кода:

Epoch 1/5
2020-04-03 18:50:43.397010: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-04-03 18:50:43.608330: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-04-03 18:50:44.274270: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-03 18:50:44.275686: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-03 18:50:44.275747: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
         [[{{node conv2d_1/convolution}}]]
Traceback (most recent call last):
  File "cnnAlert.py", line 62, in <module>
    model.fit(x=train, y=one_hot_targets, batch_size=32, epochs=5)
  File "/home/geodatin/env/py3GEE/lib/python3.6/site-packages/keras/engine/training.py", line 1239, in fit
    validation_freq=validation_freq)
  File "/home/geodatin/env/py3GEE/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 196, in fit_loop
    outs = fit_function(ins_batch)
  File "/home/geodatin/env/py3GEE/lib/python3.6/site-packages/tensorflow_core/python/keras/backend.py", line 3727, in __call__
    outputs = self._graph_fn(*converted_inputs)
  File "/home/geodatin/env/py3GEE/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 1551, in __call__
    return self._call_impl(args, kwargs)
  File "/home/geodatin/env/py3GEE/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 1591, in _call_impl
    return self._call_flat(args, self.captured_inputs, cancellation_manager)
  File "/home/geodatin/env/py3GEE/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 1692, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/home/geodatin/env/py3GEE/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 545, in call
    ctx=ctx)
  File "/home/geodatin/env/py3GEE/lib/python3.6/site-packages/tensorflow_core/python/eager/execute.py", line 67, in quick_execute
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.UnknownError:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
         [[node conv2d_1/convolution (defined at /home/geodatin/env/py3GEE/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:3009) ]] [Op:__inference_keras_scratch_graph_2350]

Function call stack:
keras_scratch_graph

Дополнительная информация:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1660    Off  | 00000000:01:00.0  On |                  N/A |
| 27%   41C    P8     9W / 120W |    211MiB /  5911MiB |      1%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0       989      G   /usr/lib/xorg/Xorg                            78MiB |
|    0      1438      G   cinnamon                                      31MiB |
|    0      8622      G   ...uest-channel-token=16736224539216711033    99MiB |
+-----------------------------------------------------------------------------+
3

Как мне решить эту ошибку? Вы можете помочь мне?

РЕДАКТИРОВАТЬ 1

  • CUDNN_VERSION от cudnn.h: 7605 (7.6.5)
  • Версия компилятора хоста: G CC 7.5 .0
  • Tensorflow: 2.1.0-rc0;
  • CUDNN lib находится в моем LD_LIBRARY_PATH

1 Ответ

1 голос
/ 15 апреля 2020

Возможно, вам потребуется установить для сеанса tenorflow config.gpu_option.allow_growth значение true, что можно сделать, добавив в начало кода следующее:

gpu_options = tf.GPUOptions(allow_growth=True)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
keras.backend.tensorflow_backend.set_session(sess)
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...