Я пытаюсь запустить модель tennflowflow-deeplab-v3 на сервере, чтобы сегментировать изображения, которые я отправляю.Все работает нормально, но проблема в том, что каждый раз, когда я отправляю изображение, модель ищет GPU и создает новое устройство GPU, и этот процесс создания устройства стоит около 10 секунд для каждого отправляемого изображения.Как я могу помешать модели каждый раз создавать устройство и просто использовать ранее созданное?
Я попытался установить CUDA_VISIBLE_DEVICES, но опять тот же результат.Я также попытался создать устройство и запустить свой код на этом устройстве, но опять тот же результат.
Я запускаю свой сервер на экземпляре Amazon p2.xlarge EC2.Информация об ОС:
Distributor ID: Ubuntu
Description: Ubuntu 16.04.6 LTS
Release: 16.04
Codename: xenial
Вывод nvidia-smi:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.40.04 Driver Version: 418.40.04 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 On | 00000000:00:1E.0 Off | 0 |
| N/A 35C P8 28W / 149W | 0MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
nvcc - вывод версии:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
версия Python: 3.5.2 пипсаверсия: 19.1.1 вывод списка пипсов:
Package Version
-------------------- ---------------
absl-py 0.7.1
astor 0.8.0
bottle 0.12.16
certifi 2019.3.9
chardet 3.0.4
cycler 0.10.0
gast 0.2.2
get 2019.4.13
google-pasta 0.1.7
grpcio 1.21.1
h5py 2.9.0
idna 2.8
Keras-Applications 1.0.8
Keras-Preprocessing 1.1.0
kiwisolver 1.1.0
Markdown 3.1.1
matplotlib 3.0.3
mock 3.0.5
numpy 1.16.4
opencv-python 4.1.0.25
Pillow 6.0.0
pip 19.1.1
post 2019.4.13
protobuf 3.8.0
public 2019.4.13
pyparsing 2.4.0
python-dateutil 2.8.0
query-string 2019.4.13
request 2019.4.13
requests 2.22.0
setuptools 41.0.1
six 1.12.0
tb-nightly 1.14.0a20190614
tensorboard 1.14.0
tensorflow-estimator 1.14.0
tensorflow-gpu 1.14.0
termcolor 1.1.0
urllib3 1.25.3
Werkzeug 0.15.4
wheel 0.33.4
wrapt 1.11.2
Вывод запросов после первого:
78.181.181.107 - - [23/Jun/2019 11:18:20] "GET / HTTP/1.1" 200 0
Request arrived.
...
Writing output masks...
2019-06-23 11:22:42.036040: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-23 11:22:42.036423: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:1e.0
2019-06-23 11:22:42.036502: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-06-23 11:22:42.036540: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-06-23 11:22:42.036572: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-06-23 11:22:42.036604: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-06-23 11:22:42.036637: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-06-23 11:22:42.036669: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-06-23 11:22:42.036702: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-06-23 11:22:42.036776: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-23 11:22:42.037106: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-23 11:22:42.037385: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-06-23 11:22:42.037430: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-23 11:22:42.037448: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0
2019-06-23 11:22:42.037465: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N
2019-06-23 11:22:42.037643: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-23 11:22:42.037953: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-23 11:22:42.038233: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10805 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7)
Preparing paths...
Paths ready. (2.3365020751953125e-05)
generating: /home/ubuntu/Cabin/ModelOutput/test_front_mask.png
Generated. (9.5367431640625e-07)
Prediction took: 11.09858751296997
Cropping /home/ubuntu/Cabin/ModelOutput/test_front_mask.png
Cropped and wrote to file. (0.06068730354309082)
Preparing paths...
Paths ready. (2.4557113647460938e-05)
generating: /home/ubuntu/Cabin/ModelOutput/test_side_mask.png
Generated. (0.0004572868347167969)
Prediction took: 0.47649669647216797
Cropping /home/ubuntu/Cabin/ModelOutput/test_side_mask.png
Cropped and wrote to file. (0.06105923652648926)
Collecting trashes...
All clear! (0.000209808349609375)
Evaluation complete. (11.765886068344116)
Measuring...
Measuring complete. (1.4767637252807617)
78.181.181.107 - - [23/Jun/2019 11:22:48] "GET / HTTP/1.1" 200 0
Я встроил скрипт вывода в свой собственный скрипт, используемый для запуска сервера, иэто как показано ниже (здесь я загружаю изображения из источника для целей тестирования, а сценарий еще не полностью завершен).Он создает устройство GPU в строке 161 при вводе цикла 'for pred_dict, image_path in zipped:':
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import time
import argparse
import os
import glob
from io import BytesIO
import tensorflow as tf
import cv2
import DeepLab.tensorflow_deeplab_v3_plus.deeplab_model as deeplab_model
from DeepLab.tensorflow_deeplab_v3_plus.utils import preprocessing
from DeepLab.tensorflow_deeplab_v3_plus.utils import dataset_util
from PIL import Image
#import matplotlib.pyplot as plt
from tensorflow.python import debug as tf_debug
from bottle import run, post, request, route
import requests
import Cropper
import Measure
...
# Using the Winograd non-fused algorithms provides a small performance boost.
os.environ['TF_ENABLE_WINOGRAD_NONFUSED'] = '1'
pred_hooks = None
if FLAGS.debug:
debug_hook = tf_debug.LocalCLIDebugHook()
pred_hooks = [debug_hook]
print("Searching for gpus...")
start = time.time()
gpus = tf.config.experimental.list_physical_devices('GPU')
end = time.time()
print("Found all gpus. ("+ str(end-start) + ")")
print("Generating model...")
start = time.time()
model = tf.estimator.Estimator(
model_fn=deeplab_model.deeplabv3_plus_model_fn,
model_dir=FLAGS.model_dir,
params={
'output_stride': FLAGS.output_stride,
'batch_size': 1, # Batch size must be 1 because the images' size may differ
'base_architecture': FLAGS.base_architecture,
'pre_trained_model': None,
'batch_norm_decay': None,
'num_classes': _NUM_CLASSES,
})
end = time.time()
print("Model ready. ("+ str(end-start) + ")")
#print("Generating tensorflow session...")
#start = time.time()
#config = tf.ConfigProto()
#sess = tf.Session(config=config)
#end = time.time()
#print("Session created. ("+ str(end-start) + ")")
def evaluate_model(image_list_dir, inference_path, data_path, model_path, model_output_path):
print("Preparing list...")
start = time.time()
# This part reads looks at the Data folder and writes the name of all files in there into sample_images_list.txt
imageList = open(image_list_dir, "w")
for file in os.listdir(data_path):
imageList.write(str(file)+"\n")
imageList.close()
end = time.time()
print("List generated ("+ str(end-start) + ")")
print("Loading images...")
start = time.time()
# This part runs the model for the current data
examples = dataset_util.read_examples_list(FLAGS.infer_data_list)
image_files = [os.path.join(FLAGS.data_dir, filename) for filename in examples]
end = time.time()
print("Images loaded ("+ str(end-start) + ")")
with tf.device("/job:localhost/replica:0/task:0/device:GPU:0"):
print("Inside device")
print("Predicting...")
start = time.time()
predictions = model.predict(
input_fn=lambda: preprocessing.eval_input_fn(image_files),
hooks=pred_hooks)
end = time.time()
print("Predictions completed. ("+ str(end-start) + ")")
output_dir = FLAGS.output_dir
if not os.path.exists(output_dir):
os.makedirs(output_dir)
print("Calling zip function...")
start = time.time()
zipped = zip(predictions, image_files)
end = time.time()
print("Zip() complete. (" + str(end-start) + ")")
print("Zipped: " + str(zipped))
print("Writing output masks...")
predictionTimeStart = time.time()
for pred_dict, image_path in zipped:
# print("pred_dict is: " + str(pred_dict))
print("Preparing paths...")
start = time.time()
image_basename = os.path.splitext(os.path.basename(image_path))[0]
output_filename = image_basename + '_mask.png'
path_to_output = os.path.join(output_dir, output_filename)
end = time.time()
print("Paths ready. (" + str(end-start) + ")")
print("generating:", path_to_output)
start = time.time()
mask = pred_dict['decoded_labels']
end = time.time()
print("Generated. ("+ str(end-start) + ")")
# Use this part to also save mask
# tmp = Image.fromarray(mask)
# plt.axis('off')
# plt.imshow(tmp)
# plt.savefig(path_to_output, bbox_inches='tight')
predictionTimeEnd = time.time()
print("Prediction took: " + str(predictionTimeEnd - predictionTimeStart))
print("Cropping " + path_to_output)
start = time.time()
Cropper.evaluate(path_to_output, cv2.cvtColor(mask, cv2.COLOR_BGR2GRAY))
end = time.time()
print("Cropped and wrote to file. ("+ str(end-start) + ")")
predictionTimeStart = time.time()
print("Collecting trashes...")
start = time.time()
for file in glob.glob(data_path + "*"):
os.remove(file)
end = time.time()
print("All clear! ("+ str(end-start) + ")")
@route('/')#@post('/')
def measure():
print("Request arrived.")
try:
# parse input data
# try:
# data = request.json()
# except:
# raise ValueError
#
# if data is None:
# raise ValueError
# extract and validate name
try:
id = "test"#data['id']
front_image_url = "https://static1.squarespace.com/static/55b4a361e4b085d388b66c34/t/59709c1903596e8ea44b089e/1501482492586/"#data['front_image_url']
side_image_url = "https://static1.squarespace.com/static/55b4a361e4b085d388b66c34/t/59709c1903596e8ea44b089e/1501482492586/"#data['side_image_url']
height = 173#data['height']
angle = 0#data['angle']
except (TypeError, KeyError):
raise ValueError
except KeyError:
# if name already exists, return 409 Conflict
response.status = 409
return
try:
print("Downloading images...")
start = time.time()
downloaded_front_image = requests.get(front_image_url)
downloaded_side_image = requests.get(side_image_url)
end = time.time()
print("Download complete. ("+ str(end-start) + ")")
except(FileNotFoundError, PermissionError, TimeoutError):
raise ValueError
print("Preparing images...")
start = time.time()
front_image = Image.open(BytesIO(downloaded_front_image.content))
side_image = Image.open(BytesIO(downloaded_side_image.content))
end = time.time()
print("Images ready. ("+ str(end-start) + ")")
print("Saving images...")
start = time.time()
front_image_name = data_path + str(id) + '_front.jpg'
side_image_name = data_path + str(id) + '_side.jpg'
front_image.save(front_image_name)
side_image.save(side_image_name)
end = time.time()
print("Images saved. ("+ str(end-start) + ")")
print("Evaluating model...")
modelstart = time.time()
evaluate_model(image_list_dir, inference_path, data_path, model_path, model_output_path)
modelend = time.time()
print("Evaluation complete. ("+ str(modelend-modelstart) + ")")
print("Measuring...")
start = time.time()
Measure.evaluate(model_output_path + str(id) + "_front_mask_cropped.png", model_output_path + str(id) + "_side_mask_cropped.png", height, angle, id)
end = time.time()
print("Measuring complete. (" + str(end-start) + ")")
pass
run(host=FLAGS.private_ip, port=FLAGS.port)
Я хочу минимизировать время вывода, поэтому я хочу иметь возможность создать устройство один рази затем использовать то же устройство для каждого другого изображения.