Получение "CUDA_ERROR_INVALID_VALUE: неверный аргумент" в python с Tensorflow - PullRequest
0 голосов
/ 23 апреля 2020

Когда я запускаю фрагмент ниже, как python test.py

import os
# Enable '0' or disable '-1' GPU use
# os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ['CUDA_VISIBLE_DEVICES'] = "0"
import warnings

with warnings.catch_warnings():
    warnings.filterwarnings("ignore", category=FutureWarning)
    import tensorflow as tf
    config = tf.compat.v1.ConfigProto()
    # config.gpu_options.visible_device_list = "0"  # pylint: disable=no-member
    config.gpu_options.allow_growth = True  # pylint: disable=no-member
    session = tf.compat.v1.Session(config=config)

# check if successfully using GPU
if tf.test.gpu_device_name():
    print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))
else:
    print('Please install GPU version of TF')

, я получаю следующую ошибку

2020-04-23 13:13:15.969352: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-04-23 13:13:15.974088: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2020-04-23 13:13:15.990122: W tensorflow/compiler/xla/service/platform_util.cc:256] unable to create StreamExecutor for CUDA:0: failed initializing StreamExecutor for CUDA device ordinal 0: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_INVALID_VALUE: invalid argument
2020-04-23 13:13:15.990240: F tensorflow/stream_executor/lib/statusor.cc:34] Attempting to fetch value instead of handling error Internal: no supported devices found for platform CUDA
Aborted (core dumped)

Когда я устанавливаю os.environ['CUDA_VISIBLE_DEVICES'] = "-1" (ie без использования графического процессора ), ошибки нет, и выходные данные соответствуют ожидаемым, как показано ниже.

2020-04-23 13:18:24.911806: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-04-23 13:18:24.916849: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2020-04-23 13:18:24.920347: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2020-04-23 13:18:24.920384: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: vumacs
2020-04-23 13:18:24.920389: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: vumacs
2020-04-23 13:18:24.920456: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 440.64.0
2020-04-23 13:18:24.920482: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 440.64.0
2020-04-23 13:18:24.920489: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 440.64.0
2020-04-23 13:18:24.938734: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3299990000 Hz
2020-04-23 13:18:24.939659: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4849f40 executing computations on platform Host. Devices:
2020-04-23 13:18:24.939686: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
Please install GPU version of TF

Есть ли способ устранить эту ошибку, поскольку ранее я использовал один и тот же код, установив для CUDA_VISIBLE_DEVICES значение 0 в обоих сценариях а также оболочка и проблем не было. Кажется, ошибка возникает при установке сеанса с помощью tf.compat.v1.Session(config=config)

Дополнительная информация

python: 3.6.9 tensorflow-gpu==1.14.0 protobuf==3.11.3 tensorflow-estimator==1.14.0

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

$ nvidia-smi
Thu Apr 23 13:22:06 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64       Driver Version: 440.64       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  Off  | 00000000:B3:00.0 Off |                  N/A |
| 26%   28C    P8    12W / 250W |    119MiB / 11019MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1277      G   /usr/lib/xorg/Xorg                            39MiB |
|    0      1388      G   /usr/bin/gnome-shell                          77MiB |
+-----------------------------------------------------------------------------+
...