«Как исправить: контрольная точка и возобновление ошибки обучения - PullRequest
0 голосов
/ 23 сентября 2019

Я недавно начал тренировать модель глубинной речи, используя подход с тензорным потоком Mozilla.Кажется, я не могу продолжить обучение с последней контрольной точки.

Я попытался перезагрузить модель и начать заново, но это не помогает.Я думал о попытке возобновить обучение со 2-й последней контрольной точки - поможет ли это вместо последней контрольной точки?

Код работал: (deepspeech-train-venv) chabani@chabani-VirtualBox:~/Train/DeepSpeech$ ./DeepSpeech.py --train_files /media/sf_en/clips/train.csv --dev_files /media/sf_en/clips/dev.csv --test_files /media/sf_en/clips/test.csv--checkpoint_dir

Я ожидал, что обучение возобновится, но яУ меня появляются следующие сообщения об ошибках:

Traceback (most recent call last):
  File "/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
    return fn(*args)
  File "/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.DataLossError: Checksum does not match: stored 1933602204 vs. calculated on the restored bytes 887113119
     [[{{node save/RestoreV2}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./DeepSpeech.py", line 903, in <module>
    absl.app.run(main)
  File "/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "./DeepSpeech.py", line 887, in main
    train()
  File "./DeepSpeech.py", line 529, in train
    loaded = try_loading(session, checkpoint_saver, checkpoint_filename, 'most recent')
  File "./DeepSpeech.py", line 401, in try_loading
    saver.restore(session, checkpoint_path)
  File "/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 1286, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 950, in run
    run_metadata_ptr)
  File "/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1173, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
    run_metadata)
  File "/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.DataLossError: Checksum does not match: stored 1933602204 vs. calculated on the restored bytes 887113119
     [[node save/RestoreV2 (defined at ./DeepSpeech.py:467) ]]

Original stack trace for 'save/RestoreV2':
  File "./DeepSpeech.py", line 903, in <module>
    absl.app.run(main)
  File "/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "./DeepSpeech.py", line 887, in main
    train()
  File "./DeepSpeech.py", line 467, in train
    checkpoint_saver = tfv1.train.Saver(max_to_keep=FLAGS.max_to_keep)
  File "/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 825, in __init__
    self.build()
  File "/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 837, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 875, in _build
    build_restore=build_restore)
  File "/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 508, in _build_internal
    restore_sequentially, reshape)
  File "/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 328, in _AddRestoreOps
    restore_sequentially)
  File "/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 575, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1696, in restore_v2
    name=name)
  File "/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
    op_def=op_def)
  File "/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 2005, in __init__
    self._traceback = tf_stack.extract_stack()```

2 ошибки исключения отмечены, как указано выше.Любая помощь будет оценена.

...