Я использую сценарий кода, предназначенный для работы с TF 2.0, чтобы генерировать прогнозы для предварительно обученной базовой модели BERT для задачи НЛП. Я использую Python 3.7 и TF 2.1 в записной книжке Google Colab с использованием экземпляра, размещенного в облачном ТПУ. Я могу успешно запустить скрипт без ошибок и генерировать прогнозы, используя облачный GPU, но получаю следующие сообщения об ошибках при попытке запустить его с TPU (после включения TPU и указания на соответствующий IP-адрес для TPU ).
2020-02-09 01:17:36.155906: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2020-02-09 01:17:36.156040: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2020-02-09 01:17:36.156061: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
WARNING:tensorflow:From tf_kaggle_test.py:188: The name tf.estimator.tpu.InputPipelineConfig is deprecated. Please use tf.compat.v1.estimator.tpu.InputPipelineConfig instead.
WARNING:tensorflow:From tf_kaggle_test.py:189: The name tf.estimator.tpu.RunConfig is deprecated. Please use tf.compat.v1.estimator.tpu.RunConfig instead.
WARNING:tensorflow:From tf_kaggle_test.py:194: The name tf.estimator.tpu.TPUConfig is deprecated. Please use tf.compat.v1.estimator.tpu.TPUConfig instead.
WARNING:tensorflow:From tf_kaggle_test.py:212: The name tf.estimator.tpu.TPUEstimator is deprecated. Please use tf.compat.v1.estimator.tpu.TPUEstimator instead.
WARNING:tensorflow:Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x7f55bb7600d0>) includes params argument, but params are not passed to Estimator.
FLAGS.predict_file data/simplified-nq-test.jsonl
***** Running predictions *****
Num orig examples = 346
Num split examples = 9409
Batch size = 8
Num split into 3 = 8
.
.
Num split into 187 = 1
output/eval.tf_record
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/resource_variable_ops.py:1635: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
2020-02-09 01:18:52.589210: W tensorflow/core/distributed_runtime/rpc/grpc_session.cc:373] GrpcSession::ListDevices will initialize the session with an empty graph and other defaults because the session has not yet been created.
WARNING:tensorflow:From /content/gdrive/My Drive/Capstone - Google QA/natural-questions-nlp-drive/tf2_0_baseline_w_bert.py:1112: map_and_batch (from tensorflow.python.data.experimental.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map(map_func, num_parallel_calls)` followed by `tf.data.Dataset.batch(batch_size, drop_remainder)`. Static tf.data optimizations will take care of using the fused implementation.
2020-02-09 01:18:53.890592: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/tracking/util.py:1262: NameBasedSaverStatus.__init__ (from tensorflow.python.training.tracking.util) is deprecated and will be removed in a future version.
Instructions for updating:
Restoring a name-based tf.train.Saver checkpoint using the object-based restore API. This mode uses global names to match variables, and so is somewhat fragile. It also adds new restore ops to the graph each time it is called when graph building. Prefer re-encoding training checkpoints in the object-based format: run save() on the object-based saver (the same one this message is coming from) and use that checkpoint in the future.
WARNING:tensorflow:From /content/gdrive/My Drive/Capstone - Google QA/natural-questions-nlp-drive/tf2_0_baseline_w_bert.py:1057: The name tf.estimator.tpu.TPUEstimatorSpec is deprecated. Please use tf.compat.v1.estimator.tpu.TPUEstimatorSpec instead.
Все приведенные выше предупреждения в порядке и продолжают работать, многие из них указывают на проблемы из-за устаревания, так как оригинальный сценарий был создан для TF 1.0 и затем переведен для работы с TF 2.0. Похоже, что проблема и сбой происходят со сценариями tpu_estimator и error_handling ниже; что-то делать с процессом ловли исключений. Я не уверен, на что он ссылается, когда ссылается на AttributeError: у объекта «NameError» нет атрибута «op», а имя «assignment_map» не определено.
WARNING:tensorflow:Reraising captured error
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3075, in predict
rendezvous.record_error('prediction_loop', sys.exc_info())
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 81, in record_error
if value and value.op and value.op.type == _CHECK_NUMERIC_OP_NAME:
AttributeError: 'NameError' object has no attribute 'op'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "tf_kaggle_test.py", line 267, in <module>
predict_input_fn, yield_single_examples=True):
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3078, in predict
rendezvous.raise_errors()
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 143, in raise_errors
six.reraise(typ, value, traceback)
File "/usr/local/lib/python3.6/dist-packages/six.py", line 693, in reraise
raise value
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3072, in predict
yield_single_examples=yield_single_examples):
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 626, in predict
features, None, ModeKeys.PREDICT, self.config)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2857, in _call_model_fn
config)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1152, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3394, in _model_fn
scaffold = _get_scaffold(scaffold_fn)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3749, in _get_scaffold
scaffold = scaffold_fn()
File "/content/gdrive/My Drive/Capstone - Google QA/natural-questions-nlp-drive/tf2_0_baseline_w_bert.py", line 994, in tpu_scaffold
tf.compat.v1.train.init_from_checkpoint(init_checkpoint, assignment_map)
NameError: name 'assignment_map' is not defined
Блокнот, в котором я использую сценарий (и он прекрасно работал с графическим процессором / процессором), находится здесь: https://www.kaggle.com/abhinand05/bert-for-humans-tutorial-baseline/data#Code -Implementation-in-Tensorflow-2.0
Это как-то связано с использованием Google Colab, которое мне нужно изменить, или необходимо внести дополнительные изменения для использования с TPU?