Несколько графических процессоров AMD с Tensorflow и OpenCL в Ubuntu 16.04 - PullRequest
0 голосов
/ 18 октября 2018

После многих трудностей:

  1. Успешно построен Tensorflow с OpenCL на свежем Ubuntu 16.04 с amdgpu 17.50 .

  2. Установлено 5 идентичных графических процессоров (rx580) , и все они сообщаются clinfo и computecpp_info, как и ожидалось.

  3. При выполнении примера с вызовом MNIST TF работает, но использует только GPU0, не видя других графических процессоров .

* об ошибках не сообщается *1028* вdmesg о карте, кажется, что все они готовы на самом низком уровне, не знаю, почему SYCL, кажется, игнорирует некоторые карты .

Вот computecpp_info вывод:

********************************************************************************

ComputeCpp Info (CE 1.0.1)

SYCL 1.2.1 revision 3

********************************************************************************

Toolchain information:

GLIBC version: 2.23
GLIBCXX: 20160609
This version of libstdc++ is supported.

********************************************************************************


Device Info:

Discovered 5 devices matching:
  platform  : <any>
  device type : <any>

--------------------------------------------------------------------------------
Device 0:

  Device is supported                   : UNTESTED - Vendor not tested on this OS
  CL_DEVICE_NAME                        : Ellesmere
  CL_DEVICE_VENDOR                      : Advanced Micro Devices, Inc.
  CL_DRIVER_VERSION                     : 2527.3
  CL_DEVICE_TYPE                        : CL_DEVICE_TYPE_GPU
--------------------------------------------------------------------------------
Device 1:

  Device is supported                   : UNTESTED - Vendor not tested on this OS
  CL_DEVICE_NAME                        : Ellesmere
  CL_DEVICE_VENDOR                      : Advanced Micro Devices, Inc.
  CL_DRIVER_VERSION                     : 2527.3
  CL_DEVICE_TYPE                        : CL_DEVICE_TYPE_GPU
--------------------------------------------------------------------------------
Device 2:

  Device is supported                   : UNTESTED - Vendor not tested on this OS
  CL_DEVICE_NAME                        : Ellesmere
  CL_DEVICE_VENDOR                      : Advanced Micro Devices, Inc.
  CL_DRIVER_VERSION                     : 2527.3
  CL_DEVICE_TYPE                        : CL_DEVICE_TYPE_GPU
--------------------------------------------------------------------------------
Device 3:

  Device is supported                   : UNTESTED - Vendor not tested on this OS
  CL_DEVICE_NAME                        : Ellesmere
  CL_DEVICE_VENDOR                      : Advanced Micro Devices, Inc.
  CL_DRIVER_VERSION                     : 2527.3
  CL_DEVICE_TYPE                        : CL_DEVICE_TYPE_GPU
--------------------------------------------------------------------------------
Device 4:

  Device is supported                   : UNTESTED - Vendor not tested on this OS
  CL_DEVICE_NAME                        : Ellesmere
  CL_DEVICE_VENDOR                      : Advanced Micro Devices, Inc.
  CL_DRIVER_VERSION                     : 2527.3
  CL_DEVICE_TYPE                        : CL_DEVICE_TYPE_GPU

If you encounter problems when using any of these OpenCL devices, please consult
this website for known issues:
https://computecpp.codeplay.com/releases/v1.0.1/platform-support-notes

********************************************************************************

Вот список из tenorflow :

$ python3 list_gpus.py
2018-10-17 23:52:44.268968: I ./tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-10-17 23:52:44.385308: I ./tensorflow/core/common_runtime/sycl/sycl_device.h:70] Found following OpenCL devices:
2018-10-17 23:52:44.385342: I ./tensorflow/core/common_runtime/sycl/sycl_device.h:72] id: 0, type: GPU, name: Ellesmere, vendor: Advanced Micro Devices, Inc., profile: FULL_PROFILE
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 5429869323017416982
, name: "/device:SYCL:0"
device_type: "SYCL"
memory_limit: 268435456
locality {
}
incarnation: 7347791393919061653
physical_device_desc: "id: 0, type: GPU, name: Ellesmere, vendor: Advanced Micro Devices, Inc., profile: FULL_PROFILE"
]

РЕДАКТИРОВАТЬ: после перезагрузки

Iна самом деле не знаю, актуальны ли эти предупреждения, потому что они исчезают после первого запуска.

$ python3 list_gpus.py
2018-10-18 00:47:13.943021: I ./tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-10-18 00:47:13.952909: W ./tensorflow/core/common_runtime/sycl/sycl_device.h:45] No OpenCL accelerator nor GPU found that is supported by ComputeCpp/triSYCL trying OpenCL CPU
2018-10-18 00:47:13.952930: W ./tensorflow/core/common_runtime/sycl/sycl_device.h:52] No OpenCL CPU found that is supported by ComputeCpp/triSYCL, checking for host sycl device
2018-10-18 00:47:13.952936: W ./tensorflow/core/common_runtime/sycl/sycl_device.h:59] Found SYCL host device
2018-10-18 00:47:13.953004: I ./tensorflow/core/common_runtime/sycl/sycl_device.h:70] Found following OpenCL devices:
2018-10-18 00:47:13.953014: I ./tensorflow/core/common_runtime/sycl/sycl_device.h:72] id: 0, type: Host, name: Host Device, vendor: Codeplay Software Ltd., profile: FULL_PROFILE

РЕДАКТИРОВАТЬ: dmesg details

[    0.000000] Linux version 4.15.0-36-generic (buildd@lcy01-amd64-017) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.10)) #39~16.04.1-Ubuntu SMP Tue Sep 25 08:59:23 UTC 2018 (Ubuntu 4.15.0-36.39~16.04.1-generic 4.15.18)
[    0.688885] pcie_mp2_amd: AMD(R) PCI-E MP2 Communication Driver Version: 1.0
[    1.143085] [drm] amdgpu kernel modesetting enabled.
[    1.173931] amdgpu 0000:03:00.0: enabling device (0000 -> 0003)
[    1.564757] amdgpu 0000:03:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
[    2.280211] amdgpu 0000:03:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
[    2.280212] amdgpu 0000:03:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
[    2.280322] [drm] amdgpu: 4096M of VRAM memory ready
[    2.280323] [drm] amdgpu: 4096M of GTT memory ready.
[    2.280427] amdgpu 0000:03:00.0: amdgpu: using MSI.
[    2.280439] [drm] amdgpu: irq initialized.
[    2.280452] amdgpu: [powerplay] amdgpu: powerplay sw initialized
[    2.280690] amdgpu 0000:03:00.0: fence driver on ring 0 use gpu addr 0x0000000000400040, cpu addr 0x        (ptrval)
[    2.280758] amdgpu 0000:03:00.0: fence driver on ring 1 use gpu addr 0x00000000004000c0, cpu addr 0x        (ptrval)
[    2.280784] amdgpu 0000:03:00.0: fence driver on ring 2 use gpu addr 0x0000000000400140, cpu addr 0x        (ptrval)
[    2.280842] amdgpu 0000:03:00.0: fence driver on ring 3 use gpu addr 0x00000000004001c0, cpu addr 0x        (ptrval)
[    2.280903] amdgpu 0000:03:00.0: fence driver on ring 4 use gpu addr 0x0000000000400240, cpu addr 0x        (ptrval)
[    2.280965] amdgpu 0000:03:00.0: fence driver on ring 5 use gpu addr 0x00000000004002c0, cpu addr 0x        (ptrval)
[    2.280985] amdgpu 0000:03:00.0: fence driver on ring 6 use gpu addr 0x0000000000400340, cpu addr 0x        (ptrval)
[    2.281001] amdgpu 0000:03:00.0: fence driver on ring 7 use gpu addr 0x00000000004003c0, cpu addr 0x        (ptrval)
[    2.281015] amdgpu 0000:03:00.0: fence driver on ring 8 use gpu addr 0x0000000000400440, cpu addr 0x        (ptrval)
[    2.281028] amdgpu 0000:03:00.0: fence driver on ring 9 use gpu addr 0x00000000004004e0, cpu addr 0x        (ptrval)
[    2.281332] amdgpu 0000:03:00.0: fence driver on ring 10 use gpu addr 0x0000000000400560, cpu addr 0x        (ptrval)
[    2.281348] amdgpu 0000:03:00.0: fence driver on ring 11 use gpu addr 0x00000000004005e0, cpu addr 0x        (ptrval)
[    2.285039] amdgpu 0000:03:00.0: fence driver on ring 12 use gpu addr 0x000000f4001e6420, cpu addr 0x        (ptrval)
[    2.285056] amdgpu 0000:03:00.0: fence driver on ring 13 use gpu addr 0x00000000004006e0, cpu addr 0x        (ptrval)
[    2.285069] amdgpu 0000:03:00.0: fence driver on ring 14 use gpu addr 0x0000000000400760, cpu addr 0x        (ptrval)
[    2.285578] amdgpu 0000:03:00.0: fence driver on ring 15 use gpu addr 0x00000000004007e0, cpu addr 0x        (ptrval)
[    2.285594] amdgpu 0000:03:00.0: fence driver on ring 16 use gpu addr 0x0000000000400860, cpu addr 0x        (ptrval)
[    2.980155] amdgpu 0000:03:00.0: kfd not supported on this ASIC
[    2.980163] [drm] Initialized amdgpu 3.23.0 20150101 for 0000:03:00.0 on minor 0
[    2.980215] amdgpu 0000:06:00.0: enabling device (0000 -> 0003)
[    4.068205] amdgpu 0000:06:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
[    4.068206] amdgpu 0000:06:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
[    4.068220] [drm] amdgpu: 4096M of VRAM memory ready
[    4.068221] [drm] amdgpu: 4096M of GTT memory ready.
[    4.068331] amdgpu 0000:06:00.0: amdgpu: using MSI.
[    4.068344] [drm] amdgpu: irq initialized.
[    4.068357] amdgpu: [powerplay] amdgpu: powerplay sw initialized
[    4.068444] amdgpu 0000:06:00.0: fence driver on ring 0 use gpu addr 0x0000000000400040, cpu addr 0x        (ptrval)
[    4.068509] amdgpu 0000:06:00.0: fence driver on ring 1 use gpu addr 0x00000000004000c0, cpu addr 0x        (ptrval)
[    4.068571] amdgpu 0000:06:00.0: fence driver on ring 2 use gpu addr 0x0000000000400140, cpu addr 0x        (ptrval)
[    4.068639] amdgpu 0000:06:00.0: fence driver on ring 3 use gpu addr 0x00000000004001c0, cpu addr 0x        (ptrval)
[    4.068665] amdgpu 0000:06:00.0: fence driver on ring 4 use gpu addr 0x0000000000400240, cpu addr 0x        (ptrval)
[    4.068718] amdgpu 0000:06:00.0: fence driver on ring 5 use gpu addr 0x00000000004002c0, cpu addr 0x        (ptrval)
[    4.068740] amdgpu 0000:06:00.0: fence driver on ring 6 use gpu addr 0x0000000000400340, cpu addr 0x        (ptrval)
[    4.068759] amdgpu 0000:06:00.0: fence driver on ring 7 use gpu addr 0x00000000004003c0, cpu addr 0x        (ptrval)
[    4.068774] amdgpu 0000:06:00.0: fence driver on ring 8 use gpu addr 0x0000000000400440, cpu addr 0x        (ptrval)
[    4.068787] amdgpu 0000:06:00.0: fence driver on ring 9 use gpu addr 0x00000000004004e0, cpu addr 0x        (ptrval)
[    4.069074] amdgpu 0000:06:00.0: fence driver on ring 10 use gpu addr 0x0000000000400560, cpu addr 0x        (ptrval)
[    4.069094] amdgpu 0000:06:00.0: fence driver on ring 11 use gpu addr 0x00000000004005e0, cpu addr 0x        (ptrval)
[    4.072854] amdgpu 0000:06:00.0: fence driver on ring 12 use gpu addr 0x000000f4001e6420, cpu addr 0x        (ptrval)
[    4.072868] amdgpu 0000:06:00.0: fence driver on ring 13 use gpu addr 0x00000000004006e0, cpu addr 0x        (ptrval)
[    4.072881] amdgpu 0000:06:00.0: fence driver on ring 14 use gpu addr 0x0000000000400760, cpu addr 0x        (ptrval)
[    4.073362] amdgpu 0000:06:00.0: fence driver on ring 15 use gpu addr 0x00000000004007e0, cpu addr 0x        (ptrval)
[    4.073376] amdgpu 0000:06:00.0: fence driver on ring 16 use gpu addr 0x0000000000400860, cpu addr 0x        (ptrval)
[    4.771466] amdgpu 0000:06:00.0: kfd not supported on this ASIC
[    4.771476] [drm] Initialized amdgpu 3.23.0 20150101 for 0000:06:00.0 on minor 2
[    4.771515] amdgpu 0000:07:00.0: enabling device (0000 -> 0003)
[    5.856168] amdgpu 0000:07:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
[    5.856169] amdgpu 0000:07:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
[    5.856178] [drm] amdgpu: 4096M of VRAM memory ready
[    5.856179] [drm] amdgpu: 4096M of GTT memory ready.
[    5.856284] amdgpu 0000:07:00.0: amdgpu: using MSI.
[    5.856297] [drm] amdgpu: irq initialized.
[    5.856311] amdgpu: [powerplay] amdgpu: powerplay sw initialized
[    5.856402] amdgpu 0000:07:00.0: fence driver on ring 0 use gpu addr 0x0000000000400040, cpu addr 0x        (ptrval)
[    5.856441] amdgpu 0000:07:00.0: fence driver on ring 1 use gpu addr 0x00000000004000c0, cpu addr 0x        (ptrval)
[    5.856464] amdgpu 0000:07:00.0: fence driver on ring 2 use gpu addr 0x0000000000400140, cpu addr 0x        (ptrval)
[    5.856541] amdgpu 0000:07:00.0: fence driver on ring 3 use gpu addr 0x00000000004001c0, cpu addr 0x        (ptrval)
[    5.856569] amdgpu 0000:07:00.0: fence driver on ring 4 use gpu addr 0x0000000000400240, cpu addr 0x        (ptrval)
[    5.856641] amdgpu 0000:07:00.0: fence driver on ring 5 use gpu addr 0x00000000004002c0, cpu addr 0x        (ptrval)
[    5.856668] amdgpu 0000:07:00.0: fence driver on ring 6 use gpu addr 0x0000000000400340, cpu addr 0x        (ptrval)
[    5.856690] amdgpu 0000:07:00.0: fence driver on ring 7 use gpu addr 0x00000000004003c0, cpu addr 0x        (ptrval)
[    5.856707] amdgpu 0000:07:00.0: fence driver on ring 8 use gpu addr 0x0000000000400440, cpu addr 0x        (ptrval)
[    5.856722] amdgpu 0000:07:00.0: fence driver on ring 9 use gpu addr 0x00000000004004e0, cpu addr 0x        (ptrval)
[    5.857007] amdgpu 0000:07:00.0: fence driver on ring 10 use gpu addr 0x0000000000400560, cpu addr 0x        (ptrval)
[    5.857027] amdgpu 0000:07:00.0: fence driver on ring 11 use gpu addr 0x00000000004005e0, cpu addr 0x        (ptrval)
[    5.860789] amdgpu 0000:07:00.0: fence driver on ring 12 use gpu addr 0x000000f4001e6420, cpu addr 0x        (ptrval)
[    5.860803] amdgpu 0000:07:00.0: fence driver on ring 13 use gpu addr 0x00000000004006e0, cpu addr 0x        (ptrval)
[    5.860817] amdgpu 0000:07:00.0: fence driver on ring 14 use gpu addr 0x0000000000400760, cpu addr 0x        (ptrval)
[    5.861298] amdgpu 0000:07:00.0: fence driver on ring 15 use gpu addr 0x00000000004007e0, cpu addr 0x        (ptrval)
[    5.861313] amdgpu 0000:07:00.0: fence driver on ring 16 use gpu addr 0x0000000000400860, cpu addr 0x        (ptrval)
[    6.563837] amdgpu 0000:07:00.0: kfd not supported on this ASIC
[    6.563845] [drm] Initialized amdgpu 3.23.0 20150101 for 0000:07:00.0 on minor 3
[    6.563887] amdgpu 0000:08:00.0: enabling device (0000 -> 0003)
[    7.648177] amdgpu 0000:08:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
[    7.648178] amdgpu 0000:08:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
[    7.648188] [drm] amdgpu: 4096M of VRAM memory ready
[    7.648188] [drm] amdgpu: 4096M of GTT memory ready.
[    7.648292] amdgpu 0000:08:00.0: amdgpu: using MSI.
[    7.648306] [drm] amdgpu: irq initialized.
[    7.648322] amdgpu: [powerplay] amdgpu: powerplay sw initialized
[    7.648406] amdgpu 0000:08:00.0: fence driver on ring 0 use gpu addr 0x0000000000400040, cpu addr 0x        (ptrval)
[    7.648470] amdgpu 0000:08:00.0: fence driver on ring 1 use gpu addr 0x00000000004000c0, cpu addr 0x        (ptrval)
[    7.648530] amdgpu 0000:08:00.0: fence driver on ring 2 use gpu addr 0x0000000000400140, cpu addr 0x        (ptrval)
[    7.648593] amdgpu 0000:08:00.0: fence driver on ring 3 use gpu addr 0x00000000004001c0, cpu addr 0x        (ptrval)
[    7.648649] amdgpu 0000:08:00.0: fence driver on ring 4 use gpu addr 0x0000000000400240, cpu addr 0x        (ptrval)
[    7.648707] amdgpu 0000:08:00.0: fence driver on ring 5 use gpu addr 0x00000000004002c0, cpu addr 0x        (ptrval)
[    7.648733] amdgpu 0000:08:00.0: fence driver on ring 6 use gpu addr 0x0000000000400340, cpu addr 0x        (ptrval)
[    7.648751] amdgpu 0000:08:00.0: fence driver on ring 7 use gpu addr 0x00000000004003c0, cpu addr 0x        (ptrval)
[    7.648769] amdgpu 0000:08:00.0: fence driver on ring 8 use gpu addr 0x0000000000400440, cpu addr 0x        (ptrval)
[    7.648782] amdgpu 0000:08:00.0: fence driver on ring 9 use gpu addr 0x00000000004004e0, cpu addr 0x        (ptrval)
[    7.649069] amdgpu 0000:08:00.0: fence driver on ring 10 use gpu addr 0x0000000000400560, cpu addr 0x        (ptrval)
[    7.649087] amdgpu 0000:08:00.0: fence driver on ring 11 use gpu addr 0x00000000004005e0, cpu addr 0x        (ptrval)
[    7.652849] amdgpu 0000:08:00.0: fence driver on ring 12 use gpu addr 0x000000f4001e6420, cpu addr 0x        (ptrval)
[    7.652862] amdgpu 0000:08:00.0: fence driver on ring 13 use gpu addr 0x00000000004006e0, cpu addr 0x        (ptrval)
[    7.652874] amdgpu 0000:08:00.0: fence driver on ring 14 use gpu addr 0x0000000000400760, cpu addr 0x        (ptrval)
[    7.653353] amdgpu 0000:08:00.0: fence driver on ring 15 use gpu addr 0x00000000004007e0, cpu addr 0x        (ptrval)
[    7.653366] amdgpu 0000:08:00.0: fence driver on ring 16 use gpu addr 0x0000000000400860, cpu addr 0x        (ptrval)
[    8.355909] amdgpu 0000:08:00.0: kfd not supported on this ASIC
[    8.355916] [drm] Initialized amdgpu 3.23.0 20150101 for 0000:08:00.0 on minor 4
[    8.355957] amdgpu 0000:09:00.0: enabling device (0000 -> 0003)
[    9.440257] amdgpu 0000:09:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
[    9.440258] amdgpu 0000:09:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
[    9.440268] [drm] amdgpu: 4096M of VRAM memory ready
[    9.440268] [drm] amdgpu: 4096M of GTT memory ready.
[    9.440376] amdgpu 0000:09:00.0: amdgpu: using MSI.
[    9.440390] [drm] amdgpu: irq initialized.
[    9.440406] amdgpu: [powerplay] amdgpu: powerplay sw initialized
[    9.440499] amdgpu 0000:09:00.0: fence driver on ring 0 use gpu addr 0x0000000000400040, cpu addr 0x        (ptrval)
[    9.440563] amdgpu 0000:09:00.0: fence driver on ring 1 use gpu addr 0x00000000004000c0, cpu addr 0x        (ptrval)
[    9.440625] amdgpu 0000:09:00.0: fence driver on ring 2 use gpu addr 0x0000000000400140, cpu addr 0x        (ptrval)
[    9.440690] amdgpu 0000:09:00.0: fence driver on ring 3 use gpu addr 0x00000000004001c0, cpu addr 0x        (ptrval)
[    9.440753] amdgpu 0000:09:00.0: fence driver on ring 4 use gpu addr 0x0000000000400240, cpu addr 0x        (ptrval)
[    9.440808] amdgpu 0000:09:00.0: fence driver on ring 5 use gpu addr 0x00000000004002c0, cpu addr 0x        (ptrval)
[    9.440831] amdgpu 0000:09:00.0: fence driver on ring 6 use gpu addr 0x0000000000400340, cpu addr 0x        (ptrval)
[    9.440849] amdgpu 0000:09:00.0: fence driver on ring 7 use gpu addr 0x00000000004003c0, cpu addr 0x        (ptrval)
[    9.440865] amdgpu 0000:09:00.0: fence driver on ring 8 use gpu addr 0x0000000000400440, cpu addr 0x        (ptrval)
[    9.440880] amdgpu 0000:09:00.0: fence driver on ring 9 use gpu addr 0x00000000004004e0, cpu addr 0x        (ptrval)
[    9.441167] amdgpu 0000:09:00.0: fence driver on ring 10 use gpu addr 0x0000000000400560, cpu addr 0x        (ptrval)
[    9.441184] amdgpu 0000:09:00.0: fence driver on ring 11 use gpu addr 0x00000000004005e0, cpu addr 0x        (ptrval)
[    9.444946] amdgpu 0000:09:00.0: fence driver on ring 12 use gpu addr 0x000000f4001e6420, cpu addr 0x        (ptrval)
[    9.444964] amdgpu 0000:09:00.0: fence driver on ring 13 use gpu addr 0x00000000004006e0, cpu addr 0x        (ptrval)
[    9.444976] amdgpu 0000:09:00.0: fence driver on ring 14 use gpu addr 0x0000000000400760, cpu addr 0x        (ptrval)
[    9.445456] amdgpu 0000:09:00.0: fence driver on ring 15 use gpu addr 0x00000000004007e0, cpu addr 0x        (ptrval)
[    9.445469] amdgpu 0000:09:00.0: fence driver on ring 16 use gpu addr 0x0000000000400860, cpu addr 0x        (ptrval)
[   10.147558] amdgpu 0000:09:00.0: kfd not supported on this ASIC
[   10.147564] [drm] Initialized amdgpu 3.23.0 20150101 for 0000:09:00.0 on minor 5
[   10.147606] amdgpu 0000:0a:00.0: enabling device (0000 -> 0003)
[   11.232197] amdgpu 0000:0a:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
[   11.232198] amdgpu 0000:0a:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
[   11.232207] [drm] amdgpu: 4096M of VRAM memory ready
[   11.232207] [drm] amdgpu: 4096M of GTT memory ready.
[   11.232309] amdgpu 0000:0a:00.0: amdgpu: using MSI.
[   11.232322] [drm] amdgpu: irq initialized.
[   11.232337] amdgpu: [powerplay] amdgpu: powerplay sw initialized
[   11.232427] amdgpu 0000:0a:00.0: fence driver on ring 0 use gpu addr 0x0000000000400040, cpu addr 0x        (ptrval)
[   11.232488] amdgpu 0000:0a:00.0: fence driver on ring 1 use gpu addr 0x00000000004000c0, cpu addr 0x        (ptrval)
[   11.232551] amdgpu 0000:0a:00.0: fence driver on ring 2 use gpu addr 0x0000000000400140, cpu addr 0x        (ptrval)
[   11.232615] amdgpu 0000:0a:00.0: fence driver on ring 3 use gpu addr 0x00000000004001c0, cpu addr 0x        (ptrval)
[   11.232675] amdgpu 0000:0a:00.0: fence driver on ring 4 use gpu addr 0x0000000000400240, cpu addr 0x        (ptrval)
[   11.232699] amdgpu 0000:0a:00.0: fence driver on ring 5 use gpu addr 0x00000000004002c0, cpu addr 0x        (ptrval)
[   11.232717] amdgpu 0000:0a:00.0: fence driver on ring 6 use gpu addr 0x0000000000400340, cpu addr 0x        (ptrval)
[   11.232735] amdgpu 0000:0a:00.0: fence driver on ring 7 use gpu addr 0x00000000004003c0, cpu addr 0x        (ptrval)
[   11.232749] amdgpu 0000:0a:00.0: fence driver on ring 8 use gpu addr 0x0000000000400440, cpu addr 0x        (ptrval)
[   11.232763] amdgpu 0000:0a:00.0: fence driver on ring 9 use gpu addr 0x00000000004004e0, cpu addr 0x        (ptrval)
[   11.233048] amdgpu 0000:0a:00.0: fence driver on ring 10 use gpu addr 0x0000000000400560, cpu addr 0x        (ptrval)
[   11.233067] amdgpu 0000:0a:00.0: fence driver on ring 11 use gpu addr 0x00000000004005e0, cpu addr 0x        (ptrval)
[   11.236830] amdgpu 0000:0a:00.0: fence driver on ring 12 use gpu addr 0x000000f4001e6420, cpu addr 0x        (ptrval)
[   11.236848] amdgpu 0000:0a:00.0: fence driver on ring 13 use gpu addr 0x00000000004006e0, cpu addr 0x        (ptrval)
[   11.236860] amdgpu 0000:0a:00.0: fence driver on ring 14 use gpu addr 0x0000000000400760, cpu addr 0x        (ptrval)
[   11.237341] amdgpu 0000:0a:00.0: fence driver on ring 15 use gpu addr 0x00000000004007e0, cpu addr 0x        (ptrval)
[   11.237355] amdgpu 0000:0a:00.0: fence driver on ring 16 use gpu addr 0x0000000000400860, cpu addr 0x        (ptrval)
[   11.939330] amdgpu 0000:0a:00.0: kfd not supported on this ASIC
[   11.939336] [drm] Initialized amdgpu 3.23.0 20150101 for 0000:0a:00.0 on minor 6

РЕДАКТИРОВАТЬ: Это не связано с какой-либо конкретной карты, только первая доступна в порядке шины.

Я попытался отключить некоторые карты, и после всех испытаний кажется, что SYCL всегда перечисляеттолько первый GPU, независимо от того, какой именно, всегда минимальный доступный номер шины.

Это также подтверждает, что между картами нет различий и что все они могут быть использованы (по крайней мере, по отдельности), поэтому ОС, я думаю, в порядке, и я предполагаю, что проблема в SYCL.

Пожалуйста, помогите!

1 Ответ

0 голосов
/ 18 октября 2018

По состоянию на сегодняшнюю дату несколько графических процессоров с Tensorflow и OpenCL в настоящее время не поддерживаются , даже если это не указано явно в документации.

Вы можете отслеживать подробности проблемы здесь, я открыл проблемуна Github: https://github.com/codeplaysoftware/tensorflow/issues/16

Я обновлю этот ответ, если что-то изменится, но, как сказал разработчик, для них это не является приоритетом!

...