Я получаю вышеуказанную ошибку (не удалось создать дескриптор cudnn: CUDNN_STATUS_INTERNAL_ERROR) при выполнении кода ниже. Я проверял, работает ли мой графический процессор с помощью tf.test.is_gpu_available
# coding: utf-8
import tensorflow as tf
import numpy as np
import keras
from models import *
import os
import gc
TF_FORCE_GPU_ALLOW_GROWTH = True
np.random.seed(1000)
#Paths
MODEL_CONF = "../models/conf/"
MODEL_WEIGHTS = "../models/weights/"
#Model informations
N_CLASSES = 3
def load_array(name):
return np.load(name, allow_pickle = True)
gc.collect()
dirData = "saved_data/"
trainDir = dirData + "train/"
model = AdaptedLeNet((168, 168, 8), N_CLASSES)
model.summary(print_fn=lambda x: print(x + '\n'))
# Compile the model with the specified loss function.
model.compile(optimizer=keras.optimizers.Adam(),
loss='categorical_crossentropy',
metrics=['accuracy'])
for filename in os.listdir(trainDir):
data = load_array(trainDir + filename)
train = data["a"]
labels = data["b"].astype(int).reshape(-1)
one_hot_targets = np.eye(N_CLASSES)[labels]
model.fit(x=train, y=one_hot_targets, batch_size=32, epochs=5)
gc.collect()
Вывод этого кода:
Epoch 1/5
2020-04-03 18:50:43.397010: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-04-03 18:50:43.608330: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-04-03 18:50:44.274270: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-03 18:50:44.275686: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-03 18:50:44.275747: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node conv2d_1/convolution}}]]
Traceback (most recent call last):
File "cnnAlert.py", line 62, in <module>
model.fit(x=train, y=one_hot_targets, batch_size=32, epochs=5)
File "/home/geodatin/env/py3GEE/lib/python3.6/site-packages/keras/engine/training.py", line 1239, in fit
validation_freq=validation_freq)
File "/home/geodatin/env/py3GEE/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 196, in fit_loop
outs = fit_function(ins_batch)
File "/home/geodatin/env/py3GEE/lib/python3.6/site-packages/tensorflow_core/python/keras/backend.py", line 3727, in __call__
outputs = self._graph_fn(*converted_inputs)
File "/home/geodatin/env/py3GEE/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 1551, in __call__
return self._call_impl(args, kwargs)
File "/home/geodatin/env/py3GEE/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 1591, in _call_impl
return self._call_flat(args, self.captured_inputs, cancellation_manager)
File "/home/geodatin/env/py3GEE/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 1692, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "/home/geodatin/env/py3GEE/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 545, in call
ctx=ctx)
File "/home/geodatin/env/py3GEE/lib/python3.6/site-packages/tensorflow_core/python/eager/execute.py", line 67, in quick_execute
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node conv2d_1/convolution (defined at /home/geodatin/env/py3GEE/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:3009) ]] [Op:__inference_keras_scratch_graph_2350]
Function call stack:
keras_scratch_graph
Дополнительная информация:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1660 Off | 00000000:01:00.0 On | N/A |
| 27% 41C P8 9W / 120W | 211MiB / 5911MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 989 G /usr/lib/xorg/Xorg 78MiB |
| 0 1438 G cinnamon 31MiB |
| 0 8622 G ...uest-channel-token=16736224539216711033 99MiB |
+-----------------------------------------------------------------------------+
3
Как мне решить эту ошибку? Вы можете помочь мне?
РЕДАКТИРОВАТЬ 1
- CUDNN_VERSION от cudnn.h: 7605 (7.6.5)
- Версия компилятора хоста: G CC 7.5 .0
- Tensorflow: 2.1.0-rc0;
- CUDNN lib находится в моем LD_LIBRARY_PATH