Как использовать nGraph с Tensorflow Object Datecition API для повышения производительности обучения нейронной сети - PullRequest
0 голосов
/ 03 февраля 2020

Я пытаюсь использовать компилятор nGraph для повышения производительности во время обучения нейронной сети с помощью TensorFlow Object Detection API.

'' '

...
from object_detection.utils import ops as util_ops
from object_detection.utils import variables_helper
from deployment import model_deploy
import ngraph_bridge                                               # <==== added import
slim = contrib_slim
...
    # Merge all summaries together.
    summary_op = tf.summary.merge(list(summaries), name='summary_op')

    # Soft placement allows placing on CPU ops without GPU implementation.
    session_config = tf.ConfigProto(allow_soft_placement=True,
                                    log_device_placement=False)
    session_config = ngraph_bridge.update_config(session_config)   # <=== updated tf.ConfigProto with ngraph

    # Save checkpoints regularly.
    keep_checkpoint_every_n_hours = train_config.keep_checkpoint_every_n_hours
    saver = tf.train.Saver(
        keep_checkpoint_every_n_hours=keep_checkpoint_every_n_hours)
...

На основе примера «Классифицировать изображение» должен запускаться сценарий тензор потока с оптимизацией ngraph, но вместо повышения производительности он замедляется Вся тренировка десятки раз. Часть печатных журналов:

2020-02-03 10:39:35.297752: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2197000000 Hz
2020-02-03 10:39:35.459174: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0xb5aab90 executing computations on platform Host. Devices:
2020-02-03 10:39:35.459614: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2020-02-03 10:39:40.571518: I /home/dockuser/ngraph-packaging/ngraph-tf/ngraph_bridge/ngraph_rewrite_pass.cc:235] NGraph using backend: CPU
2020-02-03 10:39:42.256019: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set.  If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU.  To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
WARNING:tensorflow:From /home/zekhire/Desktop/TREx/venv/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
W0203 10:39:47.468499 139703828219712 deprecation.py:323] From /home/zekhire/Desktop/TREx/venv/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from /home/zekhire/Desktop/nGraph0/models/research/object_detection/faster_rcnn_inception_v2_coco_2018_01_28/model.ckpt
I0203 10:39:47.472506 139703828219712 saver.py:1280] Restoring parameters from /home/zekhire/Desktop/nGraph0/models/research/object_detection/faster_rcnn_inception_v2_coco_2018_01_28/model.ckpt
2020-02-03 10:39:48.281986: I /home/dockuser/ngraph-packaging/ngraph-tf/ngraph_bridge/ngraph_rewrite_pass.cc:235] NGraph using backend: CPU
INFO:tensorflow:Running local_init_op.
I0203 10:39:49.839380 139703828219712 session_manager.py:500] Running local_init_op.
2020-02-03 10:39:50.875042: I /home/dockuser/ngraph-packaging/ngraph-tf/ngraph_bridge/ngraph_rewrite_pass.cc:235] NGraph using backend: CPU
INFO:tensorflow:Done running local_init_op.
I0203 10:39:52.394918 139703828219712 session_manager.py:502] Done running local_init_op.
2020-02-03 10:39:53.162139: I /home/dockuser/ngraph-packaging/ngraph-tf/ngraph_bridge/ngraph_rewrite_pass.cc:235] NGraph using backend: CPU
INFO:tensorflow:Starting Session.
I0203 10:40:05.601890 139703828219712 learning.py:754] Starting Session.
2020-02-03 10:40:05.894126: I /home/dockuser/ngraph-packaging/ngraph-tf/ngraph_bridge/ngraph_rewrite_pass.cc:235] NGraph using backend: CPU
INFO:tensorflow:Saving checkpoint to path training/model.ckpt
I0203 10:40:05.901194 139700815398656 supervisor.py:1117] Saving checkpoint to path training/model.ckpt
INFO:tensorflow:Starting Queues.
I0203 10:40:05.910505 139703828219712 learning.py:768] Starting Queues.
2020-02-03 10:40:09.406874: I /home/dockuser/ngraph-packaging/ngraph-tf/ngraph_bridge/ngraph_rewrite_pass.cc:235] NGraph using backend: CPU
2020-02-03 10:40:11.429237: I /home/dockuser/ngraph-packaging/ngraph-tf/ngraph_bridge/ngraph_rewrite_pass.cc:235] NGraph using backend: CPU
2020-02-03 10:40:11.770679: I /home/dockuser/ngraph-packaging/ngraph-tf/ngraph_bridge/ngraph_rewrite_pass.cc:235] NGraph using backend: CPU
2020-02-03 10:40:19.169635: I /home/dockuser/ngraph-packaging/ngraph-tf/ngraph_bridge/ngraph_rewrite_pass.cc:235] NGraph using backend: CPU
2020-02-03 10:40:22.582859: I /home/dockuser/ngraph-packaging/ngraph-tf/ngraph_bridge/ngraph_rewrite_pass.cc:235] NGraph using backend: CPU
INFO:tensorflow:global_step/sec: 0
I0203 10:40:23.051274 139700806944512 supervisor.py:1099] global_step/sec: 0
2020-02-03 10:40:23.592621: I /home/dockuser/ngraph-packaging/ngraph-tf/ngraph_bridge/ngraph_rewrite_pass.cc:235] NGraph using backend: CPU
INFO:tensorflow:Recording summary at step 0.
I0203 10:41:19.162143 139700253689600 supervisor.py:1050] Recording summary at step 0.
INFO:tensorflow:global_step/sec: 0
I0203 10:42:13.213560 139700806944512 supervisor.py:1099] global_step/sec: 0
INFO:tensorflow:Recording summary at step 1.
I0203 10:43:04.483940 139700253689600 supervisor.py:1050] Recording summary at step 1.
2020-02-03 10:43:31.666862: I /home/dockuser/ngraph-packaging/ngraph-tf/ngraph_bridge/ngraph_rewrite_pass.cc:235] NGraph using backend: CPU
INFO:tensorflow:global step 1: loss = 2.3407 (172.440 sec/step)
I0203 10:43:34.699288 139703828219712 learning.py:507] global step 1: loss = 2.3407 (172.440 sec/step)
2020-02-03 10:43:35.104714: I /home/dockuser/ngraph-packaging/ngraph-tf/ngraph_bridge/ngraph_rewrite_pass.cc:235] NGraph using backend: CPU
INFO:tensorflow:global_step/sec: 0.00836547
I0203 10:44:12.063697 139700806944512 supervisor.py:1099] global_step/sec: 0.00836547
INFO:tensorflow:Recording summary at step 1.
I0203 10:44:45.320101 139700253689600 supervisor.py:1050] Recording summary at step 1.
INFO:tensorflow:global step 2: loss = 2.3211 (129.023 sec/step)
I0203 10:45:46.378940 139703828219712 learning.py:507] global step 2: loss = 2.3211 (129.023 sec/step)
INFO:tensorflow:global_step/sec: 0.00736524
I0203 10:46:27.734549 139700806944512 supervisor.py:1099] global_step/sec: 0.00736524
INFO:tensorflow:global step 3: loss = 2.3794 (116.498 sec/step)
I0203 10:47:43.035150 139703828219712 learning.py:507] global step 3: loss = 2.3794 (116.498 sec/step)
INFO:tensorflow:Recording summary at step 3.
I0203 10:47:43.588864 139700253689600 supervisor.py:1050] Recording summary at step 3.
INFO:tensorflow:global_step/sec: 0.00604844
I0203 10:49:12.995684 139700806944512 supervisor.py:1099] global_step/sec: 0.00604844
INFO:tensorflow:Recording summary at step 3.
I0203 10:49:16.813774 139700253689600 supervisor.py:1050] Recording summary at step 3.
INFO:tensorflow:Saving checkpoint to path training/model.ckpt
I0203 10:50:06.312817 139700815398656 supervisor.py:1117] Saving checkpoint to path training/model.ckpt
INFO:tensorflow:global_step/sec: 0
I0203 10:50:11.974068 139700806944512 supervisor.py:1099] global_step/sec: 0
INFO:tensorflow:Recording summary at step 3.
I0203 10:50:13.481262 139700253689600 supervisor.py:1050] Recording summary at step 3.
INFO:tensorflow:global_step/sec: 0
I0203 10:52:14.711592 139700806944512 supervisor.py:1099] global_step/sec: 0
INFO:tensorflow:global_step/sec: 0
I0203 10:54:11.070465 139700806944512 supervisor.py:1099] global_step/sec: 0

Что я сделал неправильно и как мне правильно использовать ngraph для повышения производительности обучения нейронной сети?

Операционная система, на которой я использую этот проект это: Windows Подсистема для Linux Ubuntu 18.04 LTS

...