Я установил cudNN, но появляется ошибка «Не удалось получить алгоритм свертки» - PullRequest
0 голосов
/ 26 марта 2019

Итак, у меня есть машина с RTX 2060, и я хочу запустить тензор потока на ней. Однако ошибка «Не удалось получить алгоритм свертки» появляется, несмотря на то, что я установил на нее cudNN.

У меня на компьютере Linux (Xubuntu 18.04) работает Tensorflow-GPU 1.13.1. Я следовал инструкциям на сайте (которые приведены ниже) и установил через pip tenorflow-gpu.

Инструкции, которым я следовал:


# Add NVIDIA package repositories
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo apt-get update
wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt-get update

# Install NVIDIA driver
sudo apt-get install --no-install-recommends nvidia-driver-410
# Reboot. Check that GPUs are visible using the command: nvidia-smi

# Install development and runtime libraries (~4GB)
sudo apt-get install --no-install-recommends \
    cuda-10-0 \
    libcudnn7=7.4.1.5-1+cuda10.0  \
    libcudnn7-dev=7.4.1.5-1+cuda10.0
 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.0.130-1_amd64.deb sudo dpkg -i cuda-repo-ubunt

# Install TensorRT. Requires that libcudnn7 is installed above.
sudo apt-get update && \
        sudo apt-get install nvinfer-runtime-trt-repo-ubuntu1804-5.0.2-ga-cuda10.0 \
        && sudo apt-get update \
        && sudo apt-get install -y --no-install-recommends libnvinfer-dev=5.0.2-1+cuda10.0

Ошибка, которую я получаю:

2019-03-25 23:16:50.938950: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-03-25 23:16:52.732720: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-03-25 23:16:52.736377: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
Traceback (most recent call last):
  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    return fn(*args)
  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[{{node ppo/actions-and-internals/layered-network/apply/conv2d0/apply/Conv2D}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "start.py", line 54, in <module>
    main()
  File "start.py", line 51, in main
    main_loop(agent, curiousity_engine)
  File "start.py", line 23, in main_loop
    action1 = agent.act(states=get_screen())
  File "/home/user/.local/lib/python3.6/site-packages/tensorforce/agents/agent.py", line 148, in act
    independent=independent
  File "/home/user/.local/lib/python3.6/site-packages/tensorforce/models/model.py", line 1393, in act
    fetch_list = self.monitored_session.run(fetches=fetches, feed_dict=feed_dict)
  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 676, in run
    run_metadata=run_metadata)
  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1270, in run
    raise six.reraise(*original_exc_info)
  File "/home/user/.local/lib/python3.6/site-packages/six.py", line 693, in reraise
    raise value
  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1255, in run
    return self._sess.run(*args, **kwargs)
  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1327, in run
    run_metadata=run_metadata)
  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1091, in run
    return self._sess.run(*args, **kwargs)
  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[node ppo/actions-and-internals/layered-network/apply/conv2d0/apply/Conv2D (defined at /home/user/.local/lib/python3.6/site-packages/tensorforce/core/networks/layer.py:1079) ]]

Caused by op 'ppo/actions-and-internals/layered-network/apply/conv2d0/apply/Conv2D', defined at:
  File "start.py", line 54, in <module>
    main()
  File "start.py", line 41, in main
    agent, user_input = agent_build()
  File "/home/user/Downloads/v2 (2)/agent.py", line 37, in agent_build
    actions_exploration = 'epsilon_decay'
  File "/home/user/.local/lib/python3.6/site-packages/tensorforce/agents/ppo_agent.py", line 155, in __init__
    entropy_regularization=entropy_regularization
  File "/home/user/.local/lib/python3.6/site-packages/tensorforce/agents/learning_agent.py", line 141, in __init__
    batching_capacity=batching_capacity
  File "/home/user/.local/lib/python3.6/site-packages/tensorforce/agents/agent.py", line 80, in __init__
    self.model = self.initialize_model()
  File "/home/user/.local/lib/python3.6/site-packages/tensorforce/agents/ppo_agent.py", line 183, in initialize_model
    likelihood_ratio_clipping=self.likelihood_ratio_clipping
  File "/home/user/.local/lib/python3.6/site-packages/tensorforce/models/pg_prob_ratio_model.py", line 88, in __init__
    gae_lambda=gae_lambda
  File "/home/user/.local/lib/python3.6/site-packages/tensorforce/models/pg_model.py", line 98, in __init__
    requires_deterministic=False
  File "/home/user/.local/lib/python3.6/site-packages/tensorforce/models/distribution_model.py", line 90, in __init__
    discount=discount
  File "/home/user/.local/lib/python3.6/site-packages/tensorforce/models/memory_model.py", line 114, in __init__
    reward_preprocessing=reward_preprocessing
  File "/home/user/.local/lib/python3.6/site-packages/tensorforce/models/model.py", line 217, in __init__
    self.setup()
  File "/home/user/.local/lib/python3.6/site-packages/tensorforce/models/model.py", line 290, in setup
    independent=independent
  File "/home/user/.local/lib/python3.6/site-packages/tensorforce/models/memory_model.py", line 605, in create_operations
    independent=independent
  File "/home/user/.local/lib/python3.6/site-packages/tensorforce/models/model.py", line 1193, in create_operations
    independent=independent
  File "/home/user/.local/lib/python3.6/site-packages/tensorforce/models/model.py", line 1019, in create_act_operations
    deterministic=deterministic
  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/ops/template.py", line 368, in __call__
    return self._call_func(args, kwargs)
  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/ops/template.py", line 311, in _call_func
    result = self._func(*args, **kwargs)
  File "/home/user/.local/lib/python3.6/site-packages/tensorforce/models/distribution_model.py", line 187, in tf_actions_and_internals
    return_internals=True
  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/ops/template.py", line 368, in __call__
    return self._call_func(args, kwargs)
  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/ops/template.py", line 311, in _call_func
    result = self._func(*args, **kwargs)
  File "/home/user/.local/lib/python3.6/site-packages/tensorforce/core/networks/network.py", line 253, in tf_apply
    x = layer.apply(x=x, update=update)
  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/ops/template.py", line 368, in __call__
    return self._call_func(args, kwargs)
  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/ops/template.py", line 311, in _call_func
    result = self._func(*args, **kwargs)
  File "/home/user/.local/lib/python3.6/site-packages/tensorforce/core/networks/layer.py", line 1079, in tf_apply
    x = tf.nn.conv2d(input=x, filter=self.filters, strides=(1, stride_h, stride_w, 1), padding=self.padding)
  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 1026, in conv2d
    data_format=data_format, dilations=dilations, name=name)
  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
    op_def=op_def)
  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1801, in __init__
    self._traceback = tf_stack.extract_stack()

UnknownError (see above for traceback): Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[node ppo/actions-and-internals/layered-network/apply/conv2d0/apply/Conv2D (defined at /home/user/.local/lib/python3.6/site-packages/tensorforce/core/networks/layer.py:1079) ]]

Ответы [ 2 ]

1 голос
/ 18 апреля 2019

Я столкнулся с той же проблемой (ами) с той же настройкой.Я обнаружил (если я правильно помню), что некоторые из более поздних команд устанавливают более новую версию драйвера.Соответствие версий представляется очень важным.Кроме того, моя мышь перестала работать, потому что какой-то пакет ввода был деинсталлирован.

Перепутывание стоило мне дней и многочисленных чистых установок ... В итоге работала установка драйверов, cuda и cudnn вручную.Процесс далек от оптимального, и мой конечный результат не так хорош, как хотелось бы, но он работает.

Мои версии: Драйвер: 410.48 Cuda: 10.0 cuDNN: 7.4.2 (TensorRt: pickтот, который использует cuDNN 7.4.2)

Кроме того, необходимо было добавить одну из следующих строк в код тензорного потока python:

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
tf.enable_eager_execution(config=config)

или

config = tf.ConfigProto()
# config.gpu_options.allow_growth = True
config.gpu_options.per_process_gpu_memory_fraction = 0.1
sess = tf.Session(config=config)
0 голосов
/ 18 апреля 2019

Инициализируйте ваш код следующим кодом:

import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)

Проверьте подробности обсуждения здесь

Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...