CuDNN cra sh в TF 2.x после многих эпох обучения - PullRequest
1 голос
/ 08 мая 2020

В настоящее время я становлюсь все более и более отчаянным по поводу моего проекта tenorflow. На установку tensorflow ушло много часов, пока я не понял, что PyCharm, Python 3.7 и TF 2.x почему-то несовместимы. Теперь он работает, но после многих эпох обучения я получаю действительно неопределенную ошибку c CuDNN. Вы знаете, неправильный ли мой код или, например, ошибка установки? Не могли бы вы намекнуть мне направление? Я также не нашел ничего конкретного c при поиске.

Моя настройка [в скобках то, что я тоже пробовал]: * ​​1005 *

  • HW : i7-4790K, 32 ГБ ОЗУ и GeForce 2070 Super 8 ГБ
  • ОС: Windows 10 64 бит
  • Python: 3.6.8 [и 3.7 (где tf не удалось установить)]
  • IDE: PyCharm 2020.1.1 [и 2020.1]
  • Драйвер: Последний драйвер «Studio» 442.92 [а также последний «игровой» драйвер]
  • CuDA: 10.1 + последние библиотеки DLL CuDNN для этой версии [Я также пробовал 10.2, но tf не обнаруживает it]
  • TF: 2.2.0 RC4 [, 2.0.x и 2.1.5] Все пакеты, установленные через PyCharm (и, следовательно, pip)

Это ошибка возникает через ~ 3 часа обучения. В других случаях (или при параметризации net) ошибка возникает гораздо раньше. Здесь вы можете увидеть полный вывод фрагмента кода ниже:

C:\Users\Fhnx\.virtualenvs\Processing-TA9ofq3q\Scripts\python.exe C:/Users/Fhnx/.../playground/
2020-05-08 11:47:25.924424: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library cudart64_101.dll
Starting training sweep with Epochs: 10000, LRstart: 0.01, LRend: 5e-05
2020-05-08 11:47:27.887135: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library nvcuda.dll
2020-05-08 11:47:27.912998: I tensorflow/core/common_runtime/gpu/] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2070 SUPER computeCapability: 7.5
coreClock: 1.815GHz coreCount: 40 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2020-05-08 11:47:27.913212: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library cudart64_101.dll
2020-05-08 11:47:27.921203: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library cublas64_10.dll
2020-05-08 11:47:27.930115: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library cufft64_10.dll
2020-05-08 11:47:27.932760: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library curand64_10.dll
2020-05-08 11:47:27.944938: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library cusolver64_10.dll
2020-05-08 11:47:27.952321: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library cusparse64_10.dll
2020-05-08 11:47:27.960042: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library cudnn64_7.dll
2020-05-08 11:47:27.960698: I tensorflow/core/common_runtime/gpu/] Adding visible gpu devices: 0
2020-05-08 11:47:27.961058: I tensorflow/core/platform/] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-05-08 11:47:27.969636: I tensorflow/compiler/xla/service/] XLA service 0x2df4e1dcd00 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-05-08 11:47:27.969831: I tensorflow/compiler/xla/service/]   StreamExecutor device (0): Host, Default Version
2020-05-08 11:47:27.970579: I tensorflow/core/common_runtime/gpu/] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2070 SUPER computeCapability: 7.5
coreClock: 1.815GHz coreCount: 40 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2020-05-08 11:47:27.970964: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library cudart64_101.dll
2020-05-08 11:47:27.971208: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library cublas64_10.dll
2020-05-08 11:47:27.971389: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library cufft64_10.dll
2020-05-08 11:47:27.971602: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library curand64_10.dll
2020-05-08 11:47:27.971839: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library cusolver64_10.dll
2020-05-08 11:47:27.972112: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library cusparse64_10.dll
2020-05-08 11:47:27.972324: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library cudnn64_7.dll
2020-05-08 11:47:27.973322: I tensorflow/core/common_runtime/gpu/] Adding visible gpu devices: 0
2020-05-08 11:47:28.530960: I tensorflow/core/common_runtime/gpu/] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-05-08 11:47:28.531109: I tensorflow/core/common_runtime/gpu/]      0
2020-05-08 11:47:28.531180: I tensorflow/core/common_runtime/gpu/] 0:   N
2020-05-08 11:47:28.532337: I tensorflow/core/common_runtime/gpu/] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6213 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070 SUPER, pci bus id: 0000:01:00.0, compute capability: 7.5)
2020-05-08 11:47:28.534819: I tensorflow/compiler/xla/service/] XLA service 0x2df7aeb31a0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-05-08 11:47:28.534946: I tensorflow/compiler/xla/service/]   StreamExecutor device (0): GeForce RTX 2070 SUPER, Compute Capability 7.5
Model: "model"
Layer (type)                    Output Shape         Param #     Connected to
input_1 (InputLayer)            [(None, 22)]         0
tf_op_layer_ExpandDims (TensorF [(None, 22, 1)]      0           input_1[0][0]
dense (Dense)                   (None, 22, 64)       128         tf_op_layer_ExpandDims[0][0]
dense_3 (Dense)                 (None, 22, 64)       128         tf_op_layer_ExpandDims[0][0]
dense_6 (Dense)                 (None, 22, 64)       128         tf_op_layer_ExpandDims[0][0]
dense_9 (Dense)                 (None, 22, 64)       128         tf_op_layer_ExpandDims[0][0]
dense_12 (Dense)                (None, 22, 64)       128         tf_op_layer_ExpandDims[0][0]
dense_15 (Dense)                (None, 22, 64)       128         tf_op_layer_ExpandDims[0][0]
gaussian_dropout (GaussianDropo (None, 22, 64)       0           dense[0][0]
gaussian_dropout_2 (GaussianDro (None, 22, 64)       0           dense_3[0][0]
gaussian_dropout_4 (GaussianDro (None, 22, 64)       0           dense_6[0][0]
gaussian_dropout_6 (GaussianDro (None, 22, 64)       0           dense_9[0][0]
gaussian_dropout_8 (GaussianDro (None, 22, 64)       0           dense_12[0][0]
gaussian_dropout_10 (GaussianDr (None, 22, 64)       0           dense_15[0][0]
bidirectional (Bidirectional)   (None, 22, 16)       4672        gaussian_dropout[0][0]
bidirectional_2 (Bidirectional) (None, 22, 16)       4672        gaussian_dropout_2[0][0]
bidirectional_4 (Bidirectional) (None, 22, 16)       4672        gaussian_dropout_4[0][0]
bidirectional_6 (Bidirectional) (None, 22, 16)       4672        gaussian_dropout_6[0][0]
bidirectional_8 (Bidirectional) (None, 22, 16)       4672        gaussian_dropout_8[0][0]
bidirectional_10 (Bidirectional (None, 22, 16)       4672        gaussian_dropout_10[0][0]
bidirectional_1 (Bidirectional) (None, 22, 16)       1600        bidirectional[0][0]
bidirectional_3 (Bidirectional) (None, 22, 16)       1600        bidirectional_2[0][0]
bidirectional_5 (Bidirectional) (None, 22, 16)       1600        bidirectional_4[0][0]
bidirectional_7 (Bidirectional) (None, 22, 16)       1600        bidirectional_6[0][0]
bidirectional_9 (Bidirectional) (None, 22, 16)       1600        bidirectional_8[0][0]
bidirectional_11 (Bidirectional (None, 22, 16)       1600        bidirectional_10[0][0]
conv1d (Conv1D)                 (None, 20, 13)       1780        bidirectional_1[0][0]
conv1d_4 (Conv1D)               (None, 20, 13)       1780        bidirectional_3[0][0]
conv1d_8 (Conv1D)               (None, 20, 13)       1780        bidirectional_5[0][0]
conv1d_12 (Conv1D)              (None, 20, 13)       1780        bidirectional_7[0][0]
conv1d_16 (Conv1D)              (None, 20, 13)       1780        bidirectional_9[0][0]
conv1d_20 (Conv1D)              (None, 20, 13)       1780        bidirectional_11[0][0]
conv1d_1 (Conv1D)               (None, 20, 10)       1620        conv1d[0][0]
conv1d_5 (Conv1D)               (None, 20, 10)       1620        conv1d_4[0][0]
conv1d_9 (Conv1D)               (None, 20, 10)       1620        conv1d_8[0][0]
conv1d_13 (Conv1D)              (None, 20, 10)       1620        conv1d_12[0][0]
conv1d_17 (Conv1D)              (None, 20, 10)       1620        conv1d_16[0][0]
conv1d_21 (Conv1D)              (None, 20, 10)       1620        conv1d_20[0][0]
conv1d_2 (Conv1D)               (None, 20, 7)        1620        conv1d_1[0][0]
conv1d_6 (Conv1D)               (None, 20, 7)        1620        conv1d_5[0][0]
conv1d_10 (Conv1D)              (None, 20, 7)        1620        conv1d_9[0][0]
conv1d_14 (Conv1D)              (None, 20, 7)        1620        conv1d_13[0][0]
conv1d_18 (Conv1D)              (None, 20, 7)        1620        conv1d_17[0][0]
conv1d_22 (Conv1D)              (None, 20, 7)        1620        conv1d_21[0][0]
conv1d_3 (Conv1D)               (None, 20, 4)        1620        conv1d_2[0][0]
conv1d_7 (Conv1D)               (None, 20, 4)        1620        conv1d_6[0][0]
conv1d_11 (Conv1D)              (None, 20, 4)        1620        conv1d_10[0][0]
conv1d_15 (Conv1D)              (None, 20, 4)        1620        conv1d_14[0][0]
conv1d_19 (Conv1D)              (None, 20, 4)        1620        conv1d_18[0][0]
conv1d_23 (Conv1D)              (None, 20, 4)        1620        conv1d_22[0][0]
batch_normalization (BatchNorma (None, 20, 4)        16          conv1d_3[0][0]
batch_normalization_1 (BatchNor (None, 20, 4)        16          conv1d_7[0][0]
batch_normalization_2 (BatchNor (None, 20, 4)        16          conv1d_11[0][0]
batch_normalization_3 (BatchNor (None, 20, 4)        16          conv1d_15[0][0]
batch_normalization_4 (BatchNor (None, 20, 4)        16          conv1d_19[0][0]
batch_normalization_5 (BatchNor (None, 20, 4)        16          conv1d_23[0][0]
dense_1 (Dense)                 (None, 20, 128)      640         batch_normalization[0][0]
dense_4 (Dense)                 (None, 20, 128)      640         batch_normalization_1[0][0]
dense_7 (Dense)                 (None, 20, 128)      640         batch_normalization_2[0][0]
dense_10 (Dense)                (None, 20, 128)      640         batch_normalization_3[0][0]
dense_13 (Dense)                (None, 20, 128)      640         batch_normalization_4[0][0]
dense_16 (Dense)                (None, 20, 128)      640         batch_normalization_5[0][0]
gaussian_dropout_1 (GaussianDro (None, 20, 128)      0           dense_1[0][0]
gaussian_dropout_3 (GaussianDro (None, 20, 128)      0           dense_4[0][0]
gaussian_dropout_5 (GaussianDro (None, 20, 128)      0           dense_7[0][0]
gaussian_dropout_7 (GaussianDro (None, 20, 128)      0           dense_10[0][0]
gaussian_dropout_9 (GaussianDro (None, 20, 128)      0           dense_13[0][0]
gaussian_dropout_11 (GaussianDr (None, 20, 128)      0           dense_16[0][0]
flatten (Flatten)               (None, 2560)         0           gaussian_dropout_1[0][0]
flatten_1 (Flatten)             (None, 2560)         0           gaussian_dropout_3[0][0]
flatten_2 (Flatten)             (None, 2560)         0           gaussian_dropout_5[0][0]
flatten_3 (Flatten)             (None, 2560)         0           gaussian_dropout_7[0][0]
flatten_4 (Flatten)             (None, 2560)         0           gaussian_dropout_9[0][0]
flatten_5 (Flatten)             (None, 2560)         0           gaussian_dropout_11[0][0]
dense_2 (Dense)                 (None, 1)            2561        flatten[0][0]
dense_5 (Dense)                 (None, 1)            2561        flatten_1[0][0]
dense_8 (Dense)                 (None, 1)            2561        flatten_2[0][0]
dense_11 (Dense)                (None, 1)            2561        flatten_3[0][0]
dense_14 (Dense)                (None, 1)            2561        flatten_4[0][0]
dense_17 (Dense)                (None, 1)            2561        flatten_5[0][0]
concatenate (Concatenate)       (None, 6)            0           dense_2[0][0]
Total params: 97,542
Trainable params: 97,494
Non-trainable params: 48
***** Training Net ForkedConvLSTM_D64_LSTM2x8_Conv4x20x4_D1x128_dr0.40 now *****
BatchSize: 2108, NumNetParams: 97542, Feature shape: (500000, 22), Output shape: (500000, 6), In/Out Elem.: 14.0000M with est. size: 448.0000 MB
Epoch 1/10000
2020-05-08 11:47:57.675309: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library cublas64_10.dll
2020-05-08 11:47:57.962354: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library cudnn64_7.dll
2020-05-08 11:47:59.216097: W tensorflow/stream_executor/gpu/] Internal: Invoking GPU asm compilation is supported on Cuda non-Windows platforms only
Relying on driver to perform ptx compilation.
Modify $PATH to customize ptxas location.
This message will be only logged once.
238/238 [==============================] - 21s 90ms/step - loss: 0.3145 - val_loss: 0.0846 - lr: 0.0100
Epoch 2/10000
238/238 [==============================] - 15s 62ms/step - loss: 0.0851 - val_loss: 0.0837 - lr: 0.0100
Epoch 694/10000
238/238 [==============================] - 14s 61ms/step - loss: 0.0833 - val_loss: 0.0836 - lr: 5.0000e-05
Epoch 695/10000
  6/238 [..............................] - ETA: 12s - loss: 0.08302020-05-08 14:39:02.141015: E tensorflow/stream_executor/] CUDNN_STATUS_INTERNAL_ERROR
in tensorflow/stream_executor/cuda/ 'cudnnRNNBackwardData( cudnn.handle(), rnn_desc.handle(), model_dims.max_seq_length, output_desc.handles(), output_data.opaque(), output_desc.handles(), output_backprop_data.opaque(), output_h_desc.handle(), output_h_backprop_data.opaque(), output_c_desc.handle(), output_c_backprop_data.opaque(), rnn_desc.params_handle(), params.opaque(), input_h_desc.handle(), input_h_data.opaque(), input_c_desc.handle(), input_c_data.opaque(), input_desc.handles(), input_backprop_data->opaque(), input_h_desc.handle(), input_h_backprop_data->opaque(), input_c_desc.handle(), input_c_backprop_data->opaque(), workspace.opaque(), workspace.size(), reserve_space_data->opaque(), reserve_space_data->size())'
2020-05-08 14:39:02.141642: W tensorflow/core/framework/] OP_REQUIRES failed at : Internal: Failed to call ThenRnnBackward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 16, 8, 1, 22, 2108, 8]
2020-05-08 14:39:02.141037: F tensorflow/stream_executor/cuda/] Check failed: status == CUDNN_STATUS_SUCCESS (7 vs. 0)Failed to set cuDNN stream.
Process finished with exit code -1073740791 (0xC0000409)

Вот некоторый код, который должен иметь возможность запускаться и выдавать приведенный выше результат:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

# from os import environ
# environ['TF_CPP_MIN_LOG_LEVEL'] = '1'

from tensorflow.keras.models import *
from tensorflow.keras.layers import *
from tensorflow.keras.optimizers import *
import tensorflow as tf
import numpy as np
import sys

def build_model_simple(inputLength=1, outputLength=1, lr=0.0001, device="/gpu:0",
                       numLSTM=2, nNeuLSTM=8,
                       numConv=4, nFiltConv=20, szConvKernel=4,
                       numDenseInner=1, nNeuDenseInner=128):
    with tf.device(device):
        input = Input(shape=(inputLength,), dtype=tf.float32)
        inputExp = tf.expand_dims(input, -1)
        allInner = []
        for _ in range(outputLength):
            inner = Dense(nNeuFirstDense, activation="linear")(inputExp)
            inner = GaussianDropout(rate=dropoutRate)(inner)

            if numLSTM and nNeuLSTM:
                for _ in range(numLSTM):
                    inner = (Bidirectional(LSTM(nNeuLSTM, return_sequences=True))(inner))

            if numConv:
                for _ in range(numConv):
                    inner = Conv1D(filters=nFiltConv, kernel_size=szConvKernel,
                                   strides=1, padding='valid',
                inner = BatchNormalization()(inner)

            if numDenseInner:
                for _ in range(numDenseInner):
                    inner = Dense(nNeuDenseInner, activation="linear")(inner)
                    inner = GaussianDropout(rate=dropoutRate)(inner)
            inner = Flatten()(inner)
            inner = Dense(1, activation="linear")(inner)
        out = Concatenate()(allInner)
        # out = outTmp * outTmp * outTmp
        model = Model(inputs=input, outputs=out)

        model.compile(loss="mse", optimizer=Adam(lr=lr))
        # model.compile(loss="mse", optimizer=Adadelta())
        return model, 'ForkedConvLSTM_D{}_LSTM{}x{}_Conv{}x{}x{}_D{}x{}_dr{:.2f}'.format(
            numLSTM, nNeuLSTM,
            numConv, nFiltConv, szConvKernel,
            numDenseInner, nNeuDenseInner,

def scheduler(epoch, lrStart, lrEnd, lrDecay=0.05, lrNStable=10):
    lr = lrStart
    if epoch > lrNStable:
        fac = tf.math.exp(lrDecay * (lrNStable - epoch))
        lr = lrStart * fac + lrEnd * (1 - fac)
    return lr

if __name__ == '__main__':
    numFeatures = 22
    numOutputs = 6

    trainIn = np.random.rand(500000, numFeatures)
    trainOut = np.random.rand(500000, numOutputs)
    valiIn = np.random.rand(12000, numFeatures)
    valiOut = np.random.rand(12000, numOutputs)

    numDataElements = trainIn.shape[0] * (trainIn.shape[1] + trainOut.shape[1])
    sizeCalc = numDataElements * sys.getsizeof(trainIn[0][0])

    EPOCHS = 10000
    LEARNING_RATE_END = 0.00005

    print("Starting training sweep with Epochs: {}, LRstart: {}, LRend: {}".format(

    network, nwName = build_model_simple(inputLength=numFeatures, outputLength=numOutputs)

    netWeights = network.get_weights()
    numNetPrams = np.sum([ for ele in netWeights])

    # Estimation of Batch Size: GRAM * RAM Factor / NumParams in Net = ~75k. This divided by 30 for to get a
    # good rough estimate for the batch size
    BATCH_SIZE = int(np.floor(8 * 1e9 * 0.9 / numNetPrams / 35))

    print("***** Training Net {} now *****".format(nwName))
    print("BatchSize: {}, NumNetParams: {}, Feature shape: {}, Output shape: "
                 "{}, In/Out Elem.: {:.4f}M with est. size: {:.4f} MB".format(
        BATCH_SIZE, numNetPrams, trainIn.shape, trainOut.shape,
        numDataElements / 1e6, sizeCalc / 1e6))

    callback = tf.keras.callbacks.LearningRateScheduler(
    fitRes =, trainOut, batch_size=BATCH_SIZE, epochs=EPOCHS,
                         validation_data=(valiIn, valiOut),
                         callbacks=[callback, tf.keras.callbacks.TerminateOnNaN()],

1 Ответ

1 голос
/ 27 мая 2020

Для тех, кто придет после меня:

Я много играл с разными версиями. Я даже пытался заставить CUDA 10.2 работать, связывая новые библиотеки DLL со старыми именами. Но даже это не устранило ошибку.

Мне, наконец, удалось заставить его работать, удалив все вещи NVidia (включая драйверы) и установив новейшую версию 10.1 (с конца 1919 года) с студийные драйверы из этого выпуска. Итак, версия 431.86 вместо последней студийной версии 441.66.

Я не думаю, что предыдущие установки ios имели ошибку, поэтому, по моим оценкам, проблема была в версии драйвера. время ...
