Я новичок в запуске yolo Object Detection с графическим процессором
. Конфигурация сервера в лаборатории приведена здесь:
ubantu18.04 tenorflow2.2.0 CUDA = 10.1 и имеет 4 графических процессора Tesla
когда я запускаю tf.test.is_gpu_avaliable (), ответ верен, и результат здесь:
pciBusID: 0000:04:00.0 name: Tesla P100-PCIE-16GB computeCapability: 6.0
coreClock: 1.3285GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2020-04-23 08:56:49.116943: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 1 with properties:
pciBusID: 0000:08:00.0 name: Tesla P100-PCIE-16GB computeCapability: 6.0
coreClock: 1.3285GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2020-04-23 08:56:49.119336: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 2 with properties:
pciBusID: 0000:86:00.0 name: Tesla P100-PCIE-16GB computeCapability: 6.0
coreClock: 1.3285GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2020-04-23 08:56:49.121766: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 3 with properties:
pciBusID: 0000:8a:00.0 name: Tesla P100-PCIE-16GB computeCapability: 6.0
coreClock: 1.3285GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2020-04-23 08:56:49.121840: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-04-23 08:56:49.121877: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-04-23 08:56:49.121910: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-04-23 08:56:49.121941: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-04-23 08:56:49.121971: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-04-23 08:56:49.122001: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-04-23 08:56:49.122032: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-04-23 08:56:49.134068: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0, 1, 2, 3
2020-04-23 08:56:49.134119: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-04-23 08:56:49.140144: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-23 08:56:49.140169: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0 1 2 3
2020-04-23 08:56:49.140180: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N Y N N
2020-04-23 08:56:49.140187: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 1: Y N N N
2020-04-23 08:56:49.140194: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 2: N N N Y
2020-04-23 08:56:49.140203: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 3: N N Y N
2020-04-23 08:56:49.146042: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/device:GPU:0 with 141 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0)
2020-04-23 08:56:49.148097: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/device:GPU:1 with 14758 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:08:00.0, compute capability: 6.0)
2020-04-23 08:56:49.150111: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/device:GPU:2 with 14758 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:86:00.0, compute capability: 6.0)
2020-04-23 08:56:49.152165: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/device:GPU:3 with 14758 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:8a:00.0, compute capability: 6.0)
True
Я обнаружил, что компьютер распознал 4 GPU. Как только я запустил свою программу, я возникла странная проблема, что только один графический процессор работает правильно, другие не запускаются. результат nvidia-smi при запуске программы.
| NVIDIA-SMI 435.21 Driver Version: 435.21 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... Off | 00000000:04:00.0 Off | 0 |
| N/A 36C P0 31W / 250W | 15323MiB / 16280MiB | 2% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P100-PCIE... Off | 00000000:08:00.0 Off | 0 |
| N/A 41C P0 31W / 250W | 265MiB / 16280MiB | 2% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla P100-PCIE... Off | 00000000:86:00.0 Off | 0 |
| N/A 40C P0 32W / 250W | 265MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla P100-PCIE... Off | 00000000:8A:00.0 Off | 0 |
| N/A 38C P0 30W / 250W | 265MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 10540 C python 15313MiB |
| 1 10540 C python 255MiB |
| 2 10540 C python 255MiB |
| 3 10540 C python 255MiB |
+-----------------------------------------------------------------------------+
кто-то на сайте говорит, что версия tf не совпадает с версией CUDA, но я не могу изменить версию CUDA, поскольку сервер не принадлежит мне, я могу только изменить версию tf. Так что кто-нибудь может дать мне несколько предложений, чтобы он использовал 4GPU, спасибо