GPU не сильно улучшает Tensorflow при обучении определенных сетей по сравнению с Google Colab - PullRequest
0 голосов
/ 14 апреля 2019

Я ускорил учебный процесс на компьютере с 32-тактным ОЗУ и графическим процессором Nivida RTX 2070 с 8-Гбайтным ОЗУ.Я использую один и тот же кусок кода на моем компьютере и Colab.Colab показывает, что производительность улучшается в 45 раз.Однако на моем компьютере производительность немного улучшается.Знаете ли вы, как улучшить производительность моего ПК, чтобы быть ближе к колабу?

Код для тестирования:

import tensorflow as tf
import timeit

# See https://www.tensorflow.org/tutorials/using_gpu#allowing_gpu_memory_growth
config = tf.ConfigProto()
config.gpu_options.allow_growth = True

with tf.device('/cpu:0'):
    random_image_cpu = tf.random_normal((100, 100, 100, 3))
    net_cpu = tf.layers.conv2d(random_image_cpu, 32, 7)
    net_cpu = tf.reduce_sum(net_cpu)

with tf.device('/gpu:0'):
    random_image_gpu = tf.random_normal((100, 100, 100, 3))
    net_gpu = tf.layers.conv2d(random_image_gpu, 32, 7)
    net_gpu = tf.reduce_sum(net_gpu)

sess = tf.Session(config=config)

# Test execution once to detect errors early.
try:
    sess.run(tf.global_variables_initializer())
except tf.errors.InvalidArgumentError:
    print(
        '\n\nThis error most likely means that this notebook is not '
        'configured to use a GPU.  Change this in Notebook Settings via the '
        'command palette (cmd/ctrl-shift-P) or the Edit menu.\n\n')
    raise


def cpu():
    sess.run(net_cpu)


def gpu():
    sess.run(net_gpu)


# Runs the op several times.
print('Time (s) to convolve 32x7x7x3 filter over random 100x100x100x3 images '
      '(batch x height x width x channel). Sum of ten runs.')
print('CPU (s):')
cpu_time = timeit.timeit('cpu()', number=10, setup="from __main__ import cpu")
print(cpu_time)
print('GPU (s):')
gpu_time = timeit.timeit('gpu()', number=10, setup="from __main__ import gpu")
print(gpu_time)
print('GPU speedup over CPU: {}x'.format(int(cpu_time / gpu_time)))

sess.close()

На Colab производительность:

Time (s) to convolve 32x7x7x3 filter over random 100x100x100x3 images (batch x height x width x channel). Sum of ten runs.
CPU (s):
8.350230318000058
GPU (s):
0.1842791589999706
GPU speedup over CPU: 45x

Производительность моего компьютера с графическим процессором:

WARNING:tensorflow:From F:/Test/gputest/compare.py:10: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
WARNING:tensorflow:From E:\anaconda\envs\tfgpu\lib\site-packages\tensorflow\python\framework\op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2019-04-15 12:44:46.534261: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2019-04-15 12:44:46.713648: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: GeForce RTX 2070 major: 7 minor: 5 memoryClockRate(GHz): 1.62
pciBusID: 0000:01:00.0
totalMemory: 8.00GiB freeMemory: 6.57GiB
2019-04-15 12:44:46.713883: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-04-15 12:44:47.150893: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-15 12:44:47.151027: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-04-15 12:44:47.151110: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-04-15 12:44:47.151289: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6300 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
Time (s) to convolve 32x7x7x3 filter over random 100x100x100x3 images (batch x height x width x channel). Sum of ten runs.
CPU (s):
2.1350656999999997
GPU (s):
1.3208785
GPU speedup over CPU: 1x
...