API обнаружения объектов tenorflow: обучение молча проваливается - PullRequest
1 голос
/ 06 января 2020

Я использую API обнаружения объектов Tensorflow с моим собственным набором данных. В настоящее время я тренируюсь "ssd_mobilenet_v1_coco"

Каждый раз, когда я пытаюсь, тренировка начинается, но тренировка останавливается тихо и случайно, без сообщения об ошибке. (Используя КОМАНДУ ниже, в командной строке показано количество шагов в некоторой степени.) Кажется, что GPU (CUDA) также останавливается.

Я уже пытался изменить batch_size ("64" показывает лучший результат) и "ssd_mobilenet_v2_coco"

Является ли этот параметр (например, "sample_1_of_n_eval_examples = 1") или проблемой с графическим процессором?

ОС: windows10 Tensorflow ver: 1.15 Python: 3,6 CPU: i9-9900K GPU : NVIDIA GeForce RTX 2080

КОМАНДА, которую я использовал

python object_detection/model_main.py --pipeline_config_path="C:\Users\MYPATH\models\model\ssd_mobilenet_v1_coco.config" --model_dir="C:\Users\MYPATH\models\model" --num_train_steps=2000 --sample_1_of_n_eval_examples=1 --alsologtostderr

СООБЩЕНИЕ

INFO:tensorflow:Done calling model_fn.
I0106 17:49:29.545947 15104 estimator.py:1150] Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
I0106 17:49:29.545947 15104 basic_session_run_hooks.py:541] Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
I0106 17:49:32.188141 15104 monitored_session.py:240] Graph was finalized.
2020-01-06 17:49:32.200573: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-01-06 17:49:32.205758: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-01-06 17:49:32.229166: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce RTX 2080 major: 7 minor: 5 memoryClockRate(GHz): 1.71
pciBusID: 0000:01:00.0
2020-01-06 17:49:32.232539: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
2020-01-06 17:49:32.236216: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
2020-01-06 17:49:32.239801: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_100.dll
2020-01-06 17:49:32.242368: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_100.dll
2020-01-06 17:49:32.246706: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_100.dll
2020-01-06 17:49:32.250779: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_100.dll
2020-01-06 17:49:32.258807: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-01-06 17:49:32.261581: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-01-06 17:49:32.700705: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-01-06 17:49:32.703645: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0
2020-01-06 17:49:32.705343: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N
2020-01-06 17:49:32.707345: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6271 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080, pci bus id: 0000:01:00.0, compute capability: 7.5)
INFO:tensorflow:Running local_init_op.
I0106 17:49:35.342885 15104 session_manager.py:500] Running local_init_op.
INFO:tensorflow:Done running local_init_op.
I0106 17:49:35.702204 15104 session_manager.py:502] Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into C:\Users\mypath\models\model\model.ckpt.
I0106 17:49:42.856755 15104 basic_session_run_hooks.py:606] Saving checkpoints for 0 into C:\Users\mypath\models\model\model.ckpt.
2020-01-06 17:49:51.489601: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-01-06 17:49:52.410981: W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Internal: Invoking ptxas not supported on Windows
Relying on driver to perform ptx compilation. This message will be only logged once.
2020-01-06 17:49:52.445252: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
INFO:tensorflow:loss = 33.134163, step = 0
I0106 17:49:55.059146 15104 basic_session_run_hooks.py:262] loss = 33.134163, step = 0
INFO:tensorflow:global_step/sec: 2.58675
I0106 17:50:33.717694 15104 basic_session_run_hooks.py:692] global_step/sec: 2.58675
INFO:tensorflow:loss = 9.563588, step = 100 (38.659 sec)
I0106 17:50:33.717694 15104 basic_session_run_hooks.py:260] loss = 9.563588, step = 100 (38.659 sec)
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...