Как ускорить вычисления (`tf.nn.conv2d`), увеличив число графических процессоров в GCE? - PullRequest
0 голосов
/ 07 декабря 2018

В GCE (Google Compute Engine), чтобы проверить, как количество графических процессоров влияет на скорость вычислений (в данном случае tf.nn.conv2d), я запустил следующий код для среды виртуальных машин с 1 и 4 графическими процессорами.Я полагаю, что использование 4-х графических процессоров будет обрабатывать вычисления в 4 раза быстрее, чем использование одного графическогоНо сравнить Output (1) с Output (2) не так быстро, как я думал.Я что-то пропустил?Есть ли способ ускорить этот расчет, увеличив число графических процессоров.

Код:

# How to run:
# $ ipython3 this.py > stdout.txt
import numpy as np
import tensorflow as tf

# For using IPython %timeit command.
from IPython import get_ipython
ipython = get_ipython()

# Define 2d matrix width and height.
width, height = 1000, 1000

# Define 2d matrix like below.
# 
# | | 1 1 1 ... 1 | |
# | | 1 . ..... 1 | |
# | | ........... | |
# | | 1 1 1 ... 1 | |
#
arr1_N = 1
arr1 = np.ones((arr1_N,height,width), dtype=np.float32)
arr1_on_channel = arr1.reshape((arr1_N, height, width, 1))

# Define many 2d matrices like below.
#
# | | 1 1 1 ... 1 |  | 1 1 1 ... 1 |     | 1 1 1 ... 1 | | 
# | | 1 . ..... 1 |  | 1 . ..... 1 |     | 1 . ..... 1 | | 
# | | ........... |  | ........... |     | ........... | | 
# | | 1 1 1 ... 1 |, | 1 1 1 ... 1 | ,,, | 1 1 1 ... 1 | |
#
arr2_N = 100
arr2 = np.ones((arr2_N,height,width), dtype=np.float32)
arr2_on_channel = arr2.reshape((arr2_N, height, width, 1))

# Define filter matrix for convolution.
# Filter shape must be (filter_height, filter_width, in_channels, out_channels).
# See -> `https://www.tensorflow.org/api_docs/python/tf/nn/conv2d`.
fltr = np.ones(((3,3,1,1)))

# Define placeholder to take 2d matrices for input of convolution calculation.
inpt = tf.placeholder(tf.float32, shape=(None,height,width,1))

# Define operation for calculating convolution.
strides = (1,1,1,1)
convops = tf.nn.conv2d(inpt, fltr, strides, 'SAME')

# The graph has defined in the above.
# Now time to define function to do calculation of convolution.
def calc_conv(arr):
    with tf.Session() as sess:
        out = sess.run(convops, feed_dict={inpt: arr})
    return out

# Do clalculation of arr1.
print("----(N, width, height)=(%d, %d, %d)----"%(arr1_N, width, height))
ipy_cmd = "timeit calc_conv(arr1_on_channel)"
print(ipy_cmd)
ipython.magic(ipy_cmd)
print()

# Do clalculation of arr2.
print("----(N, width, height)=(%d, %d, %d)----"%(arr2_N, width, height))
ipy_cmd = "timeit calc_conv(arr2_on_channel)"
print(ipy_cmd)
ipython.magic(ipy_cmd)
print()

Я настраиваю настройку виртуальной машины для использования 1 графического процессора.И запустить скрипт выше.Вот вывод стандартного вывода:

Выход (1) - 1 GPU:

----(N, width, height)=(1, 1000, 1000)----
timeit calc_conv(arr1_on_channel)
5.19 ms ± 237 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

----(N, width, height)=(100, 1000, 1000)----
timeit calc_conv(arr2_on_channel)
162 ms ± 655 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

Затем я настраиваю настройку виртуальной машины для использования 4 графических процессоров.И запустить скрипт тоже.Но результат был не такой быстрый, как я думал.Вот вывод стандартного вывода:

Выход (2) - 4 графических процессора:

----(N, width, height)=(1, 1000, 1000)----
timeit calc_conv(arr1_on_channel)
8.74 ms ± 663 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

----(N, width, height)=(100, 1000, 1000)----
timeit calc_conv(arr2_on_channel)
154 ms ± 10 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Вот образ виртуальной машины, который я использовал:

Intel® optimized Deep Learning Image: TensorFlow 1.12.0 m13 (with Intel® MKL-DNN/MKL and CUDA 10.0)
A Debian based image with TensorFlow (With CUDA 10.0 and Intel® MKL-DNN, Intel® MKL) plus Intel® optimized NumPy, SciPy, and scikit-learn.

Вот вывод журнала, когда я запустил скрипт с 4 графическими процессорами:

$ ipython3 convolve2d_tensorflow_simple.py > simple_out_1000x1000_p100x4.txt
2018-12-07 16:36:51.914542: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-07 16:36:53.349588: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-12-07 16:36:53.350163: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:00:04.0
totalMemory: 15.90GiB freeMemory: 15.61GiB
2018-12-07 16:36:53.484136: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-12-07 16:36:53.484710: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 1 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:00:05.0
totalMemory: 15.90GiB freeMemory: 15.61GiB
2018-12-07 16:36:53.629408: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-12-07 16:36:53.629992: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 2 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:00:06.0
totalMemory: 15.90GiB freeMemory: 15.61GiB
2018-12-07 16:36:53.772568: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-12-07 16:36:53.773188: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 3 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:00:07.0
totalMemory: 15.90GiB freeMemory: 15.61GiB
2018-12-07 16:36:53.776050: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0, 1, 2, 3
2018-12-07 16:36:55.136658: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-12-07 16:36:55.136721: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 1 2 3 
2018-12-07 16:36:55.136729: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N Y N N 
2018-12-07 16:36:55.136733: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 1:   Y N N N 
2018-12-07 16:36:55.136736: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 2:   N N N Y 
2018-12-07 16:36:55.136740: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 3:   N N Y N 
2018-12-07 16:36:55.137692: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15129 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0)
2018-12-07 16:36:55.138322: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 15129 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:05.0, compute capability: 6.0)
2018-12-07 16:36:55.138717: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 15129 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:06.0, compute capability: 6.0)
2018-12-07 16:36:55.139065: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 15129 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:07.0, compute capability: 6.0)
2018-12-07 16:36:56.327153: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0, 1, 2, 3
2018-12-07 16:36:56.327321: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-12-07 16:36:56.327338: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 1 2 3 
2018-12-07 16:36:56.327343: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N Y N N 
2018-12-07 16:36:56.327347: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 1:   Y N N N 
2018-12-07 16:36:56.327354: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 2:   N N N Y 
2018-12-07 16:36:56.327362: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 3:   N N Y N 
2018-12-07 16:36:56.328156: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15129 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0)
2018-12-07 16:36:56.328381: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 15129 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:05.0, compute capability: 6.0)
2018-12-07 16:36:56.328652: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 15129 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:06.0, compute capability: 6.0)
2018-12-07 16:36:56.328847: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 15129 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:07.0, compute capability: 6.0)
2018-12-07 16:36:56.336493: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0, 1, 2, 3
2018-12-07 16:36:56.336621: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-12-07 16:36:56.336639: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 1 2 3 
2018-12-07 16:36:56.336644: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N Y N N 
2018-12-07 16:36:56.336647: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 1:   Y N N N 
2018-12-07 16:36:56.336662: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 2:   N N N Y 
2018-12-07 16:36:56.336671: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 3:   N N Y N 
2018-12-07 16:36:56.337342: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15129 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0)
2018-12-07 16:36:56.337481: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 15129 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:05.0, compute capability: 6.0)
2018-12-07 16:36:56.337630: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 15129 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:06.0, compute capability: 6.0)
2018-12-07 16:36:56.337754: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 15129 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:07.0, compute capability: 6.0)
2018-12-07 16:36:56.344687: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0, 1, 2, 3
2018-12-07 16:36:56.344806: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-12-07 16:36:56.344813: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 1 2 3 
2018-12-07 16:36:56.344828: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N Y N N 
2018-12-07 16:36:56.344834: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 1:   Y N N N 
2018-12-07 16:36:56.344837: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 2:   N N N Y 
2018-12-07 16:36:56.344844: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 3:   N N Y N 
2018-12-07 16:36:56.345529: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15129 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0)
2018-12-07 16:36:56.345701: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 15129 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:05.0, compute capability: 6.0)
2018-12-07 16:36:56.345870: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 15129 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:06.0, compute capability: 6.0)
2018-12-07 16:36:56.346015: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 15129 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:07.0, compute capability: 6.0)
2018-12-07 16:36:56.353019: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0, 1, 2, 3
2018-12-07 16:36:56.353118: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-12-07 16:36:56.353125: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 1 2 3 
2018-12-07 16:36:56.353129: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N Y N N 
2018-12-07 16:36:56.353132: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 1:   Y N N N 
2018-12-07 16:36:56.353135: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 2:   N N N Y 
2018-12-07 16:36:56.353139: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 3:   N N Y N 
2018-12-07 16:36:56.353805: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15129 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0)
2018-12-07 16:36:56.353941: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 15129 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:05.0, compute capability: 6.0)
2018-12-07 16:36:56.354107: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 15129 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:06.0, compute capability: 6.0)
2018-12-07 16:36:56.354270: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 15129 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:07.0, compute capability: 6.0)
2018-12-07 16:36:56.361134: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0, 1, 2, 3
2018-12-07 16:36:56.361233: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-12-07 16:36:56.361240: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 1 2 3 
2018-12-07 16:36:56.361244: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N Y N N 
2018-12-07 16:36:56.361257: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 1:   Y N N N 
2018-12-07 16:36:56.361269: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 2:   N N N Y 
2018-12-07 16:36:56.361285: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 3:   N N Y N 
2018-12-07 16:36:56.361938: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15129 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0)
2018-12-07 16:36:56.362085: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 15129 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:05.0, compute capability: 6.0)
2018-12-07 16:36:56.362260: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 15129 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:06.0, compute capability: 6.0)
2018-12-07 16:36:56.362412: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 15129 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:07.0, compute capability: 6.0)
2018-12-07 16:36:56.370310: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0, 1, 2, 3
2018-12-07 16:36:56.370442: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-12-07 16:36:56.370460: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 1 2 3 
2018-12-07 16:36:56.370465: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N Y N N 
2018-12-07 16:36:56.370468: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 1:   Y N N N 
2018-12-07 16:36:56.370472: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 2:   N N N Y 
2018-12-07 16:36:56.370488: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 3:   N N Y N 
2018-12-07 16:36:56.371162: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15129 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0)
2018-12-07 16:36:56.371300: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 15129 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:05.0, compute capability: 6.0)
2018-12-07 16:36:56.371455: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 15129 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:06.0, compute capability: 6.0)
2018-12-07 16:36:56.371582: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 15129 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:07.0, compute capability: 6.0)
2018-12-07 16:36:56.378548: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0, 1, 2, 3
2018-12-07 16:36:56.378727: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-12-07 16:36:56.378748: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 1 2 3 
2018-12-07 16:36:56.378753: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N Y N N 
2018-12-07 16:36:56.378781: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 1:   Y N N N 
2018-12-07 16:36:56.378786: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 2:   N N N Y 
2018-12-07 16:36:56.378790: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 3:   N N Y N 
2018-12-07 16:36:56.379497: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15129 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0)
2018-12-07 16:36:56.379625: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 15129 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:05.0, compute capability: 6.0)
2018-12-07 16:36:56.380007: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 15129 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:06.0, compute capability: 6.0)
2018-12-07 16:36:56.380141: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 15129 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:07.0, compute capability: 6.0)
2018-12-07 16:36:56.389387: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0, 1, 2, 3
2018-12-07 16:36:56.389497: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-12-07 16:36:56.389504: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 1 2 3 
2018-12-07 16:36:56.389516: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N Y N N 
2018-12-07 16:36:56.389520: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 1:   Y N N N 
2018-12-07 16:36:56.389524: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 2:   N N N Y 
2018-12-07 16:36:56.389528: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 3:   N N Y N 
2018-12-07 16:36:56.390242: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15129 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0)
2018-12-07 16:36:56.390410: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 15129 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:05.0, compute capability: 6.0)
2018-12-07 16:36:56.390623: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 15129 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:06.0, compute capability: 6.0)
2018-12-07 16:36:56.390743: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 15129 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:07.0, compute capability: 6.0)
2018-12-07 16:36:57.186091: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0, 1, 2, 3
2018-12-07 16:36:57.186276: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-12-07 16:36:57.186291: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 1 2 3 
2018-12-07 16:36:57.186297: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N Y N N 
2018-12-07 16:36:57.186301: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 1:   Y N N N 
2018-12-07 16:36:57.186305: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 2:   N N N Y 
2018-12-07 16:36:57.186319: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 3:   N N Y N 
2018-12-07 16:36:57.187149: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15129 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0)
2018-12-07 16:36:57.187347: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 15129 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:05.0, compute capability: 6.0)
2018-12-07 16:36:57.187524: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 15129 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:06.0, compute capability: 6.0)
2018-12-07 16:36:57.187687: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 15129 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:07.0, compute capability: 6.0)
2018-12-07 16:36:57.338152: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0, 1, 2, 3
2018-12-07 16:36:57.338309: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-12-07 16:36:57.338342: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 1 2 3 
2018-12-07 16:36:57.338347: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N Y N N 
2018-12-07 16:36:57.338350: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 1:   Y N N N 
2018-12-07 16:36:57.338355: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 2:   N N N Y 
2018-12-07 16:36:57.338358: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 3:   N N Y N 
2018-12-07 16:36:57.339170: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15129 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0)
2018-12-07 16:36:57.339390: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 15129 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:05.0, compute capability: 6.0)
2018-12-07 16:36:57.339552: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 15129 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:06.0, compute capability: 6.0)
2018-12-07 16:36:57.339744: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 15129 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:07.0, compute capability: 6.0)
.
.
.
2018-12-07 16:36:58.117850: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-12-07 16:36:58.117871: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 1 2 3 
2018-12-07 16:36:58.117875: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N Y N N 
2018-12-07 16:36:58.117879: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 1:   Y N N N 
2018-12-07 16:36:58.117883: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 2:   N N N Y 
2018-12-07 16:36:58.117892: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 3:   N N Y N 
2018-12-07 16:36:58.118642: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15129 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0)
2018-12-07 16:36:58.118823: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 15129 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:05.0, compute capability: 6.0)
2018-12-07 16:36:58.118983: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 15129 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:06.0, compute capability: 6.0)
2018-12-07 16:36:58.119115: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 15129 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:07.0, compute capability: 6.0)

Спасибо.

...