Ошибка встраивания универсального кодировщика предложений: ресурс ЦП распределителя исчерпан - PullRequest
1 голос
/ 19 июня 2020

Приведите следующий код:

import tensorflow_hub as hub
import tensorflow.compat.v1 as tf
tf.disable_eager_execution()
with tf.Session() as sess:
  embed=hub.load("https://tfhub.dev/google/universal-sentence-encoder/4")
  sess.run([tf.global_variables_initializer(), tf.tables_initializer()])
  sess.run(embed(["test message"]))

Мне удалось запустить его на локальной машине Ubuntu CPU i5 и 3 ГБ ОЗУ, однако, когда я пытаюсь запустить его на VPS centOS с 4 ГБ Подробности RAM и uname: Linux 2.6.32-042stab141.3 MSK 2019 x86_64, я получаю следующую ошибку:

 2020-06-19 14:15:10.612770: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Resource exhausted: OOM when allocating tensor with shape[26667,320] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu
    Traceback (most recent call last):
      File "/home/userM/.local/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1365, in _do_call
    return fn(*args)

  File "/home/userM/.local/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1349, in _run_fn
    return self._call_tf_sessionrun(options, feed_dict, fetch_list,
  File "/home/userM/.local/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1441, in _call_tf_sessionrun
    return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[26667,320] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu         [[{{node RestoreV2}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "testEncoder.py", line 12, in <module>
    get_features(["one message"])
  File "testEncoder.py", line 5, in get_features
    sess.run([tf.global_variables_initializer(), tf.tables_initializer()])
  File "/home/userM/.local/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 957, in run
    result = self._run(None, fetches, feed_dict, options_ptr,
  File "/home/userM/.local/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1180, in _run
    results = self._do_run(handle, final_targets, final_fetches,
  File "/home/userM/.local/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1358, in _do_run
    return self._do_call(_run_fn, feeds, fetches, targets, options,
  File "/home/userM/.local/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[26667,320] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu
         [[node RestoreV2 (defined at /home/userM/.local/lib/python3.8/site-packages/tensorflow_hub/module_v2.py:102) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Original stack trace for 'RestoreV2':
  File "testEncoder.py", line 10, in <module>
    embed=hub.load("universal_sentence_encoder")
  File "/home/userM/.local/lib/python3.8/site-packages/tensorflow_hub/module_v2.py", line 102, in load
    obj = tf_v1.saved_model.load_v2(module_path, tags=tags)
  File "/home/userM/.local/lib/python3.8/site-packages/tensorflow/python/saved_model/load.py", line 578, in load
    return load_internal(export_dir, tags)
  File "/home/userM/.local/lib/python3.8/site-packages/tensorflow/python/saved_model/load.py", line 602, in load_internal
    loader = loader_cls(object_graph_proto,
  File "/home/userM/.local/lib/python3.8/site-packages/tensorflow/python/saved_model/load.py", line 124, in __init__
    self._restore_checkpoint()
  File "/home/userM/.local/lib/python3.8/site-packages/tensorflow/python/saved_model/load.py", line 310, in _restore_checkpoint
    load_status = saver.restore(variables_path)
  File "/home/userM/.local/lib/python3.8/site-packages/tensorflow/python/training/tracking/util.py", line 1303, in restore
    base.CheckpointPosition(
  File "/home/userM/.local/lib/python3.8/site-packages/tensorflow/python/training/tracking/base.py", line 209, in restore
    restore_ops = trackable._restore_from_checkpoint_position(self)  # pylint: disable=protected-access
  File "/home/userM/.local/lib/python3.8/site-packages/tensorflow/python/training/tracking/base.py", line 906, in _restore_from_checkpoint_position
    current_position.checkpoint.restore_saveables(
  File "/home/userM/.local/lib/python3.8/site-packages/tensorflow/python/training/tracking/util.py", line 288, in restore_saveables
    new_restore_ops = functional_saver.MultiDeviceSaver(
  File "/home/userM/.local/lib/python3.8/site-packages/tensorflow/python/training/saving/functional_saver.py", line 281, in restore
    restore_ops.update(saver.restore(file_prefix))
  File "/home/userM/.local/lib/python3.8/site-packages/tensorflow/python/training/saving/functional_saver.py", line 95, in restore
    restored_tensors = io_ops.restore_v2(
  File "/home/userM/.local/lib/python3.8/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1503, in restore_v2
    _, _, _op, _outputs = _op_def_library._apply_op_helper(
  File "/home/userM/.local/lib/python3.8/site-packages/tensorflow/python/framework/op_def_library.py", line 742, in _apply_op_helper
    op = g._create_op_internal(op_type_name, inputs, dtypes=None,
  File "/home/userM/.local/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 3319, in _create_op_internal
    ret = Operation(
  File "/home/userM/.local/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 1791, in __init__
    self._traceback = tf_stack.extract_stack()

Я прочитал, что очистка сеанса после обучения модели с помощью tf.keras.backend .clear_session () может быть исправлением, но на самом деле это не так, потому что это не так. Есть идеи, что могло вызвать эту ошибку? и как исправить?

Спасибо

Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...