Tenserflow зависает при запуске логического вывода с включенным графическим процессором - PullRequest
0 голосов
/ 23 апреля 2020

Я новичок в AI и TensorFlow и пытаюсь использовать API обнаружения объектов TensorFlow на windows.
Моя текущая цель - обнаружение человека в реальном времени в видеопотоке.
Для этого я изменил пример python из Модельного сада TensorFlow (https://github.com/tensorflow/models).
В данный момент он обнаруживает все объекты (не только людей) и показывает ограничивающие прямоугольники, используя opencv.

Работает нормально, когда у меня отключен графический процессор (os.environ ["CUDA_VISIBLE_DEVICES"] = "-1")
Но когда я включаю графический процессор и запускаю скрипт, он зависает на первый кадр.

Вывод:

2020-04-22 16:00:53.597492: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-04-22 16:00:56.942141: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-04-22 16:00:56.976635: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 960M computeCapability: 5.0
coreClock: 1.176GHz coreCount: 5 deviceMemorySize: 2.00GiB deviceMemoryBandwidth: 74.65GiB/s
2020-04-22 16:00:56.989129: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-04-22 16:00:57.000622: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-04-22 16:00:57.012247: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-04-22 16:00:57.020575: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-04-22 16:00:57.031536: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-04-22 16:00:57.042564: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-04-22 16:00:57.066289: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-22 16:00:57.075760: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-04-22 16:00:59.239211: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-04-22 16:00:59.256577: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1f3f73cd670 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-04-22 16:00:59.264241: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-04-22 16:00:59.272280: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 960M computeCapability: 5.0
coreClock: 1.176GHz coreCount: 5 deviceMemorySize: 2.00GiB deviceMemoryBandwidth: 74.65GiB/s
2020-04-22 16:00:59.281409: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-04-22 16:00:59.288204: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-04-22 16:00:59.293112: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-04-22 16:00:59.298222: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-04-22 16:00:59.305446: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-04-22 16:00:59.310590: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-04-22 16:00:59.316250: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-22 16:00:59.324588: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-04-22 16:01:00.831569: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-22 16:01:00.839147: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]      0
2020-04-22 16:01:00.842279: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0:   N
2020-04-22 16:01:00.846140: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1024 MB memory) -> physical GPU (device: 0, name: GeForce GTX 960M, pci bus id: 0000:01:00.0, compute capability: 5.0)
2020-04-22 16:01:00.865546: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1f39174cba0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-04-22 16:01:00.873656: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 960M, Compute Capability 5.0
[<tf.Tensor 'image_tensor:0' shape=(None, None, None, 3) dtype=uint8>]
2020-04-22 16:01:10.876733: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-22 16:01:11.814909: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: Invoking GPU asm compilation is supported on Cuda non-Windows platforms only
Relying on driver to perform ptx compilation.
Modify $PATH to customize ptxas location.
This message will be only logged once.
2020-04-22 16:01:11.852909: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-04-22 16:01:12.149312: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.04GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-04-22 16:01:12.179484: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.04GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-04-22 16:01:12.209036: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.06GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-04-22 16:01:12.237205: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.05GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-04-22 16:01:12.266147: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.09GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-04-22 16:01:12.295182: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.08GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-04-22 16:01:12.325645: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.15GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-04-22 16:01:12.357550: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.15GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-04-22 16:01:12.405332: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.14GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-04-22 16:01:12.436336: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.27GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.

Это код, который я использую:

#!/usr/bin/env python
# coding: utf-8

import os
import pathlib

if "models" in pathlib.Path.cwd().parts:
  while "models" in pathlib.Path.cwd().parts:
    os.chdir('..')

import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile

from collections import defaultdict
from io import StringIO
from PIL import Image
from IPython.display import display

import cv2 
cap = cv2.VideoCapture(1)

from object_detection.utils import ops as utils_ops
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as vis_util

# patch tf1 into `utils.ops`
utils_ops.tf = tf.compat.v1

# Patch the location of gfile
tf.gfile = tf.io.gfile

# os.environ["CUDA_VISIBLE_DEVICES"] = "-1"

def load_model(model_name):
  base_url = 'http://download.tensorflow.org/models/object_detection/'
  model_file = model_name + '.tar.gz'
  model_dir = tf.keras.utils.get_file(
    fname=model_name, 
    origin=base_url + model_file,
    untar=True)

  model_dir = pathlib.Path(model_dir)/"saved_model"

  model = tf.saved_model.load(str(model_dir))
  model = model.signatures['serving_default']

  return model

# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = 'models/research/object_detection/data/mscoco_label_map.pbtxt'
category_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS, use_display_name=True)

model_name = 'ssd_mobilenet_v1_coco_2017_11_17'
# model_name= 'faster_rcnn_inception_v2_coco_2017_11_08';
detection_model = load_model(model_name)

print(detection_model.inputs)

detection_model.output_dtypes
detection_model.output_shapes

def run_inference_for_single_image(model, image):
    image = np.asarray(image)
    # The input needs to be a tensor, convert it using `tf.convert_to_tensor`.
    input_tensor = tf.convert_to_tensor(image)
    # The model expects a batch of images, so add an axis with `tf.newaxis`.
    input_tensor = input_tensor[tf.newaxis,...]

    # Run inference (it hangs here)
    output_dict = model(input_tensor)

    # All outputs are batches tensors.
    # Convert to numpy arrays, and take index [0] to remove the batch dimension.
    # We're only interested in the first num_detections.
    num_detections = int(output_dict.pop('num_detections'))
    output_dict = {key:value[0, :num_detections].numpy() 
                 for key,value in output_dict.items()}
    output_dict['num_detections'] = num_detections

    # detection_classes should be ints.
    output_dict['detection_classes'] = output_dict['detection_classes'].astype(np.int64)

    # Handle models with masks:
    if 'detection_masks' in output_dict:
        # Reframe the the bbox mask to the image size.
        detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(output_dict['detection_masks'], output_dict['detection_boxes'],image.shape[0], image.shape[1])      
        detection_masks_reframed = tf.cast(detection_masks_reframed > 0.5,tf.uint8)
        output_dict['detection_masks_reframed'] = detection_masks_reframed.numpy()

    return output_dict

def show_inference(model):
    # the array based representation of the image will be used later in order to prepare the
    # result image with boxes and labels on it.
    ret, image_np = cap.read()

    #percent by which the image is resized
    #scale_percent = 30

    #calculate the 50 percent of original dimensions
    #width = int(image_np.shape[1] * scale_percent / 100)
    #height = int(image_np.shape[0] * scale_percent / 100)

    # dsize
    #dsize = (width, height)

    # resize image
    #image_np = cv2.resize(image_np, dsize)

    # Actual detection.
    output_dict = run_inference_for_single_image(model, image_np)

    # Visualization of the results of a detection.
    vis_util.visualize_boxes_and_labels_on_image_array(
      image_np,
      output_dict['detection_boxes'],
      output_dict['detection_classes'],
      output_dict['detection_scores'],
      category_index,
      instance_masks=output_dict.get('detection_masks_reframed', None),
      use_normalized_coordinates=True,
      line_thickness=8)

    cv2.imshow('object detection', cv2.resize(image_np, (800,600)))

while True:
  show_inference(detection_model)
  if cv2.waitKey(25) & 0xFF == ord('q'):
    cv2.destroyAllWindows()
    break

Я установил следующие версии:
Python: 3.7 64 бит
Тензор потока: 2.2.0-rc3
Cuda: 10.1
cudnn 7.6.5.32

Я пробовал это на 2 разных машинах:
Машина 1:
- Процессор: i7-6700HQ
- Оперативная память: 16 ГБ
- Графический процессор: NVIDIA GeForce GTX 960M

Машина 2:
- Процессор: i5-6400
- Оперативная память: 16 ГБ
- GPU: NVIDIA GeForce GTX 960

Я не уверен, как это отладить. Я пробовал один и тот же код на двух разных машинах, и результат был почти одинаковым.
Единственная разница заключалась в том, что время зависало. Машина 1 зависла бы сразу, а машина 2 заняла примерно 30 секунд.
Машине 2 удалось обработать видео и обнаружить объект вплоть до зависания.

Я посмотрел на предупреждение «Распределителю (GPU_0_bf c) не хватило памяти».
Я пробовал некоторые параметры, которые ограничивали объем доступной памяти графического процессора, но это не помогло.

Там, где также несколько сообщений, которые предложили уменьшить размер партии.
Моя интерпретация заключалась в том, что это было полезно только при обучении вашей собственной модели.
А поскольку я использую предварительно обученные модели, это не применимо.

Я также пытался использовать разные модели: ssd_mobilenet_v1_coco_2017_11_17 и fast_rcnn_inception_v2_coco_2017_11_08. Обе модели имеют одинаковый результат.

Последнее, что я попытался, это уменьшить размер изображения перед его обработкой. Это также не помогло.

Любая помощь будет высоко ценится

Обновление
Я также пробовал это на супер GPU RTX2070. Предупреждений о выделении памяти нет. Это также не может завершить ни одного вывода. Просто для полноты, это вывод консоли [Текст «начало вывода» печатается перед выполнением вывода. Если вывод завершится, будет напечатано «конец вывода»]:

2020-04-24 11:30:16.579805: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-04-24 11:30:18.916146: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-04-24 11:30:18.941805: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2070 SUPER computeCapability: 7.5
coreClock: 1.785GHz coreCount: 40 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2020-04-24 11:30:18.946134: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-04-24 11:30:18.951172: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-04-24 11:30:18.954809: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-04-24 11:30:18.957258: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-04-24 11:30:18.961662: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-04-24 11:30:18.965553: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-04-24 11:30:18.978671: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-24 11:30:18.980998: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-04-24 11:30:18.982226: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-04-24 11:30:18.984167: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2070 SUPER computeCapability: 7.5
coreClock: 1.785GHz coreCount: 40 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2020-04-24 11:30:18.987291: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-04-24 11:30:18.988809: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-04-24 11:30:18.990303: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-04-24 11:30:18.991792: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-04-24 11:30:18.993320: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-04-24 11:30:18.996960: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-04-24 11:30:18.998497: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-24 11:30:19.000191: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-04-24 11:30:19.430864: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-24 11:30:19.433076: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0
2020-04-24 11:30:19.434566: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N
2020-04-24 11:30:19.436400: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6281 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070 SUPER, pci bus id: 0000:01:00.0, compute capability: 7.5)
[<tf.Tensor 'image_tensor:0' shape=(None, None, None, 3) dtype=uint8>]
inference start
2020-04-24 11:30:24.728554: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-24 11:30:25.608426: W tensorflow/stream_executor/gpu/redzone_allocator.cc:312] Internal: Invoking GPU asm compilation is supported on Cuda non-Windows platforms only
Relying on driver to perform ptx compilation. This message will be only logged once.
2020-04-24 11:30:25.625904: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll

Обновление 2
Когда режим Eager отключен, все работает нормально (даже на GPU), но тогда я не смогу найти найденные объекты.
Следующее, что я попробовал, это запустить его с сессиями (как, например, TensorFlow 1). Здесь функция session.run () блокируется бесконечно на GPU. И снова на процессоре он работает нормально.

1 Ответ

0 голосов
/ 24 апреля 2020

Если вы используете GPU, попробуйте установить tenorflow-gpu. Тензорный поток, который вы используете, кажется, основан на документации для поддержки графического процессора, но вы можете попытаться указать это неявно. Попробуйте сначала в python виртуальной среде.

    pip uninstall tensorflow

удалите tenorflow-gpu: (обязательно запустите это, даже если вы не уверены, что установили его)

    pip uninstall tensorflow-gpu

Установить спецификацию c Версия tenorflow-GPU:

    pip install tensorflow-gpu==2.0.0
...