Question

Я пытаюсь запустить модель tennflowflow-deeplab-v3 на сервере, чтобы сегментировать изображения, которые я отправляю.Все работает нормально, но проблема в том, что каждый раз, когда я отправляю изображение, модель ищет GPU и создает новое устройство GPU, и этот процесс создания устройства стоит около 10 секунд для каждого отправляемого изображения.Как я могу помешать модели каждый раз создавать устройство и просто использовать ранее созданное?

Я попытался установить CUDA_VISIBLE_DEVICES, но опять тот же результат.Я также попытался создать устройство и запустить свой код на этом устройстве, но опять тот же результат.

Я запускаю свой сервер на экземпляре Amazon p2.xlarge EC2.Информация об ОС:

Distributor ID: Ubuntu
Description:    Ubuntu 16.04.6 LTS
Release:    16.04
Codename:   xenial

Вывод nvidia-smi:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.40.04    Driver Version: 418.40.04    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           On   | 00000000:00:1E.0 Off |                    0 |
| N/A   35C    P8    28W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

nvcc - вывод версии:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

версия Python: 3.5.2 пипсаверсия: 19.1.1 вывод списка пипсов:

Package              Version        
-------------------- ---------------
absl-py              0.7.1          
astor                0.8.0          
bottle               0.12.16        
certifi              2019.3.9       
chardet              3.0.4          
cycler               0.10.0         
gast                 0.2.2          
get                  2019.4.13      
google-pasta         0.1.7          
grpcio               1.21.1         
h5py                 2.9.0          
idna                 2.8            
Keras-Applications   1.0.8          
Keras-Preprocessing  1.1.0          
kiwisolver           1.1.0          
Markdown             3.1.1          
matplotlib           3.0.3          
mock                 3.0.5          
numpy                1.16.4         
opencv-python        4.1.0.25       
Pillow               6.0.0          
pip                  19.1.1         
post                 2019.4.13      
protobuf             3.8.0          
public               2019.4.13      
pyparsing            2.4.0          
python-dateutil      2.8.0          
query-string         2019.4.13      
request              2019.4.13      
requests             2.22.0         
setuptools           41.0.1         
six                  1.12.0         
tb-nightly           1.14.0a20190614
tensorboard          1.14.0         
tensorflow-estimator 1.14.0         
tensorflow-gpu       1.14.0         
termcolor            1.1.0          
urllib3              1.25.3         
Werkzeug             0.15.4         
wheel                0.33.4         
wrapt                1.11.2

Вывод запросов после первого:

78.181.181.107 - - [23/Jun/2019 11:18:20] "GET / HTTP/1.1" 200 0
Request arrived.
...
Writing output masks...
2019-06-23 11:22:42.036040: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-23 11:22:42.036423: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:1e.0
2019-06-23 11:22:42.036502: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-06-23 11:22:42.036540: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-06-23 11:22:42.036572: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-06-23 11:22:42.036604: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-06-23 11:22:42.036637: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-06-23 11:22:42.036669: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-06-23 11:22:42.036702: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-06-23 11:22:42.036776: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-23 11:22:42.037106: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-23 11:22:42.037385: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-06-23 11:22:42.037430: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-23 11:22:42.037448: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
2019-06-23 11:22:42.037465: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
2019-06-23 11:22:42.037643: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-23 11:22:42.037953: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-23 11:22:42.038233: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10805 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7)
Preparing paths...
Paths ready. (2.3365020751953125e-05)
generating: /home/ubuntu/Cabin/ModelOutput/test_front_mask.png
Generated. (9.5367431640625e-07)
Prediction took: 11.09858751296997
Cropping /home/ubuntu/Cabin/ModelOutput/test_front_mask.png
Cropped and wrote to file. (0.06068730354309082)
Preparing paths...
Paths ready. (2.4557113647460938e-05)
generating: /home/ubuntu/Cabin/ModelOutput/test_side_mask.png
Generated. (0.0004572868347167969)
Prediction took: 0.47649669647216797
Cropping /home/ubuntu/Cabin/ModelOutput/test_side_mask.png
Cropped and wrote to file. (0.06105923652648926)
Collecting trashes...
All clear! (0.000209808349609375)
Evaluation complete. (11.765886068344116)
Measuring...
Measuring complete. (1.4767637252807617)
78.181.181.107 - - [23/Jun/2019 11:22:48] "GET / HTTP/1.1" 200 0

Я встроил скрипт вывода в свой собственный скрипт, используемый для запуска сервера, иэто как показано ниже (здесь я загружаю изображения из источника для целей тестирования, а сценарий еще не полностью завершен).Он создает устройство GPU в строке 161 при вводе цикла 'for pred_dict, image_path in zipped:':

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import time
import argparse
import os
import glob
from io import BytesIO

import tensorflow as tf
import cv2

import DeepLab.tensorflow_deeplab_v3_plus.deeplab_model as deeplab_model
from DeepLab.tensorflow_deeplab_v3_plus.utils import preprocessing
from DeepLab.tensorflow_deeplab_v3_plus.utils import dataset_util

from PIL import Image
#import matplotlib.pyplot as plt

from tensorflow.python import debug as tf_debug

from bottle import run, post, request, route
import requests

import Cropper
import Measure


...


# Using the Winograd non-fused algorithms provides a small performance boost.
os.environ['TF_ENABLE_WINOGRAD_NONFUSED'] = '1'

pred_hooks = None
if FLAGS.debug:
    debug_hook = tf_debug.LocalCLIDebugHook()
    pred_hooks = [debug_hook]

print("Searching for gpus...")
start = time.time()
gpus = tf.config.experimental.list_physical_devices('GPU')
end = time.time()
print("Found all gpus. ("+ str(end-start) + ")")

print("Generating model...")
start = time.time()
model = tf.estimator.Estimator(
    model_fn=deeplab_model.deeplabv3_plus_model_fn,
    model_dir=FLAGS.model_dir,
    params={
      'output_stride': FLAGS.output_stride,
      'batch_size': 1,  # Batch size must be 1 because the images' size may differ
      'base_architecture': FLAGS.base_architecture,
      'pre_trained_model': None,
      'batch_norm_decay': None,
      'num_classes': _NUM_CLASSES,
    })
end = time.time()
print("Model ready. ("+ str(end-start) + ")")

#print("Generating tensorflow session...")
#start = time.time()
#config = tf.ConfigProto()
#sess = tf.Session(config=config)
#end = time.time()
#print("Session created. ("+ str(end-start) + ")")

def evaluate_model(image_list_dir, inference_path, data_path, model_path, model_output_path):
    print("Preparing list...")
    start = time.time()
    # This part reads looks at the Data folder and writes the name of all files in there into sample_images_list.txt
    imageList = open(image_list_dir, "w")
    for file in os.listdir(data_path):
        imageList.write(str(file)+"\n")
    imageList.close()
    end = time.time()
    print("List generated ("+ str(end-start) + ")")

    print("Loading images...")
    start = time.time()
    # This part runs the model for the current data
    examples = dataset_util.read_examples_list(FLAGS.infer_data_list)
    image_files = [os.path.join(FLAGS.data_dir, filename) for filename in examples]
    end = time.time()
    print("Images loaded ("+ str(end-start) + ")")

    with tf.device("/job:localhost/replica:0/task:0/device:GPU:0"):
        print("Inside device")
        print("Predicting...")
        start = time.time()
        predictions = model.predict(
            input_fn=lambda: preprocessing.eval_input_fn(image_files),
            hooks=pred_hooks)
        end = time.time()
        print("Predictions completed. ("+ str(end-start) + ")")

        output_dir = FLAGS.output_dir
        if not os.path.exists(output_dir):
            os.makedirs(output_dir)

        print("Calling zip function...")
        start = time.time()
        zipped = zip(predictions, image_files)
        end = time.time()
        print("Zip() complete. (" + str(end-start) + ")")

        print("Zipped: " + str(zipped))

        print("Writing output masks...")
        predictionTimeStart = time.time()

        for pred_dict, image_path in zipped:
    #        print("pred_dict is: " + str(pred_dict))

            print("Preparing paths...")
            start = time.time()
            image_basename = os.path.splitext(os.path.basename(image_path))[0]
            output_filename = image_basename + '_mask.png'
            path_to_output = os.path.join(output_dir, output_filename)
            end = time.time()
            print("Paths ready. (" + str(end-start) + ")")

            print("generating:", path_to_output)
            start = time.time()
            mask = pred_dict['decoded_labels']
            end = time.time()
            print("Generated. ("+ str(end-start) + ")")

            # Use this part to also save mask
    #        tmp = Image.fromarray(mask)
    #        plt.axis('off')
    #        plt.imshow(tmp)
    #        plt.savefig(path_to_output, bbox_inches='tight')

            predictionTimeEnd = time.time()
            print("Prediction took: " + str(predictionTimeEnd - predictionTimeStart))

            print("Cropping " + path_to_output)
            start = time.time()
            Cropper.evaluate(path_to_output, cv2.cvtColor(mask, cv2.COLOR_BGR2GRAY))
            end = time.time()
            print("Cropped and wrote to file. ("+ str(end-start) + ")")

            predictionTimeStart = time.time()

        print("Collecting trashes...")
        start = time.time()
        for file in glob.glob(data_path + "*"):
            os.remove(file)
        end = time.time()
        print("All clear! ("+ str(end-start) + ")")


@route('/')#@post('/')
def measure():
    print("Request arrived.")
    try:
        # parse input data
#        try:
#            data = request.json()
#        except:
#            raise ValueError
#
#        if data is None:
#            raise ValueError

        # extract and validate name
        try:
            id = "test"#data['id']
            front_image_url = "https://static1.squarespace.com/static/55b4a361e4b085d388b66c34/t/59709c1903596e8ea44b089e/1501482492586/"#data['front_image_url']
            side_image_url = "https://static1.squarespace.com/static/55b4a361e4b085d388b66c34/t/59709c1903596e8ea44b089e/1501482492586/"#data['side_image_url']
            height = 173#data['height']
            angle = 0#data['angle']
        except (TypeError, KeyError):
            raise ValueError

    except KeyError:
        # if name already exists, return 409 Conflict
        response.status = 409
        return

    try:
        print("Downloading images...")
        start = time.time()
        downloaded_front_image = requests.get(front_image_url)
        downloaded_side_image = requests.get(side_image_url)
        end = time.time()
        print("Download complete. ("+ str(end-start) + ")")
    except(FileNotFoundError, PermissionError, TimeoutError):
        raise ValueError

    print("Preparing images...")
    start = time.time()
    front_image = Image.open(BytesIO(downloaded_front_image.content))
    side_image = Image.open(BytesIO(downloaded_side_image.content))
    end = time.time()
    print("Images ready. ("+ str(end-start) + ")")

    print("Saving images...")
    start = time.time()
    front_image_name = data_path + str(id) + '_front.jpg'
    side_image_name = data_path + str(id) + '_side.jpg'

    front_image.save(front_image_name)
    side_image.save(side_image_name)
    end = time.time()
    print("Images saved. ("+ str(end-start) + ")")

    print("Evaluating model...")
    modelstart = time.time()
    evaluate_model(image_list_dir, inference_path, data_path, model_path, model_output_path)
    modelend = time.time()
    print("Evaluation complete. ("+ str(modelend-modelstart) + ")")

    print("Measuring...")
    start = time.time()
    Measure.evaluate(model_output_path + str(id) + "_front_mask_cropped.png", model_output_path + str(id) + "_side_mask_cropped.png", height, angle, id)
    end = time.time()
    print("Measuring complete. (" + str(end-start) + ")")

    pass

run(host=FLAGS.private_ip, port=FLAGS.port)

Я хочу минимизировать время вывода, поэтому я хочу иметь возможность создать устройство один рази затем использовать то же устройство для каждого другого изображения.

Создание новых устройств Tensorflow несколько раз

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

Ответы [ 0 ]

Создание новых устройств Tensorflow несколько раз

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

Ответы [ 0 ]

Похожие темы