Keras (TensorFlow) аварийно завершает работу и выбрасывает ValueError - PullRequest
0 голосов
/ 06 мая 2020

Я использую Keras для регрессии, однако, когда я запускаю модель, я получаю 2 ошибки:

1:

E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1329] function cupti_interface_-
>EnableCallback( 0 , subscriber_, CUPTI_CB_DOMAIN_DRIVER_API, cbid)
failed with error CUPTI could not be loaded or symbol could not be found.

2:

ValueError: slice index 0 of dimension 0 out of bounds. for 'sequential/gru/strided_slice_2' 
(op: 'StridedSlice') with input shapes: [0,?,279], [1], [1], [1] and with computed input 
tensors: input[1] = <0>, input[2] = <1>, input[3] = <1>.

1 происходит в первую эпоху, я считаю, что это как-то связано с графическим процессором, не имеющим правильного программного обеспечения

2 Происходит после сбоя, когда он не может найти мой файл контрольной точки, файл не создается из-за того, что все значения потерь являются значениями NaN вместо реального значения

Полный журнал (кроме ошибок пакета, чтобы я мог публиковать):

Train for 100 steps, validate on 1 samples
Epoch 1/20
2020-05-06 14:37:46.721382: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-05-06 14:37:46.880683: I tensorflow/core/profiler/lib/profiler_session.cc:225] Profiler session started.
2020-05-06 14:37:46.880708: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1259] Profiler found 1 GPUs
2020-05-06 14:37:46.880871: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcupti.so.10.1'; dlerror: libcupti.so.10.1: cannot open shared object file: No such file or directory
2020-05-06 14:37:46.880880: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1307] function cupti_interface_->Subscribe( &subscriber_, (CUpti_CallbackFunc)ApiCallback, this)failed with error CUPTI could not be loaded or symbol could not be found.
2020-05-06 14:37:46.880885: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1346] function cupti_interface_->ActivityRegisterCallbacks( AllocCuptiActivityBuffer, FreeCuptiActivityBuffer)failed with error CUPTI could not be loaded or symbol could not be found.
  1/100 [..............................] - ETA: 2:00 - loss: nan2020-05-06 14:37:46.920527: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1329] function cupti_interface_->EnableCallback( 0 , subscriber_, CUPTI_CB_DOMAIN_DRIVER_API, cbid)failed with error CUPTI could not be loaded or symbol could not be found.
 99/100 [============================>.] - ETA: 0s - loss: nan  gpu/device_tracer.cc:88]  GpuTracer has collected 0 callback api events and 0 activity events.
100/100 [==============================] - 5s 52ms/step - loss: nan - val_loss: nan
Epoch 2/20
 99/100 [============================>.] - ETA: 0s - loss: nan
100/100 [==============================] - 4s 38ms/step - loss: nan - val_loss: nan
Epoch 3/20
 99/100 [============================>.] - ETA: 0s - loss: nan
100/100 [==============================] - 4s 38ms/step - loss: nan - val_loss: nan
Epoch 4/20
 99/100 [============================>.] - ETA: 0s - loss: nan
100/100 [==============================] - 4s 38ms/step - loss: nan - val_loss: nan
Epoch 5/20
 99/100 [============================>.] - ETA: 0s - loss: nan
100/100 [==============================] - 4s 38ms/step - loss: nan - val_loss: nan
Epoch 00005: early stopping
2020-05-06 14:38:06.066407: W tensorflow/core/kernels/data/generator_dataset_op.cc:103] Error occurred when finalizing GeneratorDataset iterator: Cancelled: Operation was cancelled
Error trying to load checkpoint.
Unable to open file (unable to open file: name = 'checkpoint.keras', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)
1/1 [==============================] - 0s 2ms/sample - loss: nan
loss (test-set): nan
Traceback (most recent call last):
  File "/home/matthew/PycharmProjects/pollution/venv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1619, in _create_c_op
    c_op = c_api.TF_FinishOperation(op_desc)
tensorflow.python.framework.errors_impl.InvalidArgumentError: slice index 0 of dimension 0 out of bounds. for 'sequential/gru/strided_slice_2' (op: 'StridedSlice') with input shapes: [0,?,279], [1], [1], [1] and with computed input tensors: input[1] = <0>, input[2] = <1>, input[3] = <1>.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./manage.py", line 21, in <module>
    main()
  File "./manage.py", line 17, in main
    execute_from_command_line(sys.argv)
  File "/home/matthew/PycharmProjects/pollution/venv/lib/python3.6/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
    utility.execute()
  File "/home/matthew/PycharmProjects/pollution/venv/lib/python3.6/site-packages/django/core/management/__init__.py", line 375, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/home/matthew/PycharmProjects/pollution/venv/lib/python3.6/site-packages/django/core/management/base.py", line 323, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/home/matthew/PycharmProjects/pollution/venv/lib/python3.6/site-packages/django/core/management/base.py", line 364, in execute
    output = self.handle(*args, **options)
  File "/home/matthew/PycharmProjects/pollution/smog/management/commands/tensorflow_build.py", line 8, in handle
    build_model()
  File "/home/matthew/PycharmProjects/pollution/smog/business/tensorflow_building/building_tensorflow_model.py", line 372, in build_model
    plot_comparison(start_idx=100000, length=1000, train=True,x_train_scaled=x_train_scaled, x_test_scaled=x_test_scaled, y_train=y_train,y_test=y_test,y_scaler=y_scaler,model=model)
  File "/home/matthew/PycharmProjects/pollution/smog/business/tensorflow_building/building_tensorflow_model.py", line 134, in plot_comparison
    y_pred = model.predict(x)
  File "/home/matthew/PycharmProjects/pollution/venv/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training.py", line 1013, in predict
    use_multiprocessing=use_multiprocessing)
  File "/home/matthew/PycharmProjects/pollution/venv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1622, in _create_c_op
    raise ValueError(str(e))
ValueError: slice index 0 of dimension 0 out of bounds. for 'sequential/gru/strided_slice_2' (op: 'StridedSlice') with input shapes: [0,?,279], [1], [1], [1] and with computed input tensors: input[1] = <0>, input[2] = <1>, input[3] = <1>.

Я не могу найти никаких других подробностей о том, что могло вызвать эту ошибку в Интернете, любая помощь приветствуется, так как это для школьного проекта.

Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...