tf.lookup.StaticVocabularyTable с num_oov_buckets не работает в обслуживании TF - PullRequest
1 голос
/ 03 июля 2019

Я создал модель TF, которая использует tf.lookup.StaticVocabularyTable для создания карты вокаба внутри TF Graph.Он читает сопоставление из текстового файла и присваивает num_oov_buckets=500.Ниже приведена часть кода -

num_oov_buckets = 500
table_init = tf.lookup.TextFileInitializer('resmap.txt', tf.int64, 0, tf.int64, 1, delimiter=" ")
table = tf.lookup.StaticVocabularyTable(table_init, num_oov_buckets)

При использовании этого он отлично работает во время обучения и прогнозирования.
Я конвертирую эту модель TF в обслуживание Tensorflow, используя следующий код -

from model import ModelWDN

with tf.Session() as sess:

    tf.app.flags.DEFINE_string('f', '', 'kernel')
    tf.app.flags.DEFINE_integer('model_version', 1, 'version number of the model.')
    tf.app.flags.DEFINE_string('save_dir', '/home/abhilash', 'Saving directory.')
    FLAGS = tf.app.flags.FLAGS

    export_path = os.path.join(tf.compat.as_bytes(FLAGS.save_dir), tf.compat.as_bytes(str(FLAGS.model_version)))
    print('Exporting trained model to', export_path)

    # Creating Model object and initializing all the global variables in TF Graph.
    model = ModelWDN(res_count=21663)
    sess.run(tf.global_variables_initializer())
    sess.run(tf.local_variables_initializer())
    sess.run(tf.tables_initializer())

    tf.train.Saver().restore(sess, os.path.join('/home/abhilash', 'wdn'))
    print("Model restored.")

    # SavedModel Builder Object
    builder = tf.saved_model.builder.SavedModelBuilder(export_path)

    # Converting Tensor to TensorInfo Objects so that they can be used in SignatureDefs
    tensor_info_click_hist_str = tf.saved_model.utils.build_tensor_info(model.click_hist_str)
    tensor_info_res_to_predict_str = tf.saved_model.utils.build_tensor_info(model.res_to_predict_str)
    tensor_info_prob = tf.saved_model.utils.build_tensor_info(model.logits_all)

    # SignatureDef
    prediction_signature = (
          tf.saved_model.signature_def_utils.build_signature_def(
              inputs={'click_hist_str':tensor_info_click_hist_str,
                      'res_to_predict_str':tensor_info_res_to_predict_str},
              outputs={'probs': tensor_info_prob},
              method_name=tf.saved_model.signature_constants.PREDICT_METHOD_NAME))

    builder.add_meta_graph_and_variables(
                sess=sess,
                tags=[tf.saved_model.tag_constants.SERVING],
                signature_def_map={'predict_ad_view_prob': prediction_signature},
                main_op=tf.tables_initializer(), 
                strip_default_attrs=False,
                )

    # Export the model
    builder.save()
    print('Done exporting TF Model to SavedModel format!')

Он преобразуется без каких-либо ошибок и дает правильный прогноз, когда я предоставляю любое значение, существующее в resmap.txt, которое я дал при определении tf.lookup.TextFileInitializer.Любое значение, которое не существует в этой карте, дает ошибку в обслуживании при выполнении запроса curl, но не дает никакой ошибки в противном случае (т.е. при прогнозировании по модели TF внутри сеанса).Запрос скручивания -

curl -X POST http://localhost:8501/v1/models/1:predict -d '{"signature_name": "predict_ad_view_prob", "inputs":{"res_to_predict_str": ["9 18788418 19039855 18771619"], "click_hist_str": ["18198449 18656271 18198449"]}}'

Здесь 9 - это идентификатор, который отсутствует в resmap.txt

Ниже приведена ошибка, которую я получаю при выполнении запроса скручивания -

{ "error": "indices[0] = 21748 is not in [0, 21663)\n\t [[{{node GatherV2_5}}]]" }

resmap.txt имеет 21663 значения ключа, а num_oov_buckets установлено равным 500.

Тот же самый ввод, когда прогнозирование в сеансе TF дает правильный результат -

[[0.10621755 0.50749264 0.08582641 0.00173556]]

Такочевидно, что есть какая-то проблема с num_oov_buckets и графом, в котором это неправильно реализовано при обслуживании или если я что-то упускаю / неправильно строю TF SavedModel, тогда дайте мне знать.

ОБНОВЛЕНИЕ - добавление сохраненного_моделя_cli showи запускать команды

saved_model_cli show --dir 1 --all
MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['predict_ad_view_prob']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['click_hist_str'] tensor_info:
        dtype: DT_STRING
        shape: (-1)
        name: Placeholder_3:0
    inputs['res_to_predict_str'] tensor_info:
        dtype: DT_STRING
        shape: (-1)
        name: Placeholder_5:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['probs'] tensor_info:
        dtype: DT_DOUBLE
        shape: (-1, -1)
        name: Sigmoid:0
  Method name is: tensorflow/serving/predict
saved_model_cli run --dir 1 --tag_set serve --signature_def predict_ad_view_prob --input_exprs 'click_hist_str=["50 50"];res_to_predict_str=["50 303960 1 2"]'
2019-07-18 10:18:54.805220: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2019-07-18 10:18:54.810121: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-18 10:18:54.811041: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:1e.0
2019-07-18 10:18:54.811492: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-07-18 10:18:54.813643: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-07-18 10:18:54.815415: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-07-18 10:18:54.815914: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-07-18 10:18:54.818528: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-07-18 10:18:54.820856: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-07-18 10:18:54.826085: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-07-18 10:18:54.826234: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-18 10:18:54.827152: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-18 10:18:54.827807: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-07-18 10:18:54.828138: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-07-18 10:18:54.856561: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300065000 Hz
2019-07-18 10:18:54.857004: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5635e1749450 executing computations on platform Host. Devices:
2019-07-18 10:18:54.857037: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2019-07-18 10:18:54.984822: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-18 10:18:54.985784: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5635e36188b0 executing computations on platform CUDA. Devices:
2019-07-18 10:18:54.985823: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Tesla K80, Compute Capability 3.7
2019-07-18 10:18:54.986072: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-18 10:18:54.987021: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:1e.0
2019-07-18 10:18:54.987099: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-07-18 10:18:54.987152: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-07-18 10:18:54.987202: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-07-18 10:18:54.987250: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-07-18 10:18:54.987300: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-07-18 10:18:54.987362: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-07-18 10:18:54.987413: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-07-18 10:18:54.987554: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-18 10:18:54.988526: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-18 10:18:54.989347: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-07-18 10:18:54.989418: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-07-18 10:18:54.995160: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-18 10:18:54.995475: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
2019-07-18 10:18:54.995629: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
2019-07-18 10:18:54.995938: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-18 10:18:54.996963: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-18 10:18:54.997884: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8895 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7)
WARNING: Logging before flag parsing goes to stderr.
W0718 10:18:54.999173 140274532570944 deprecation.py:323] From /home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/tools/saved_model_cli.py:339: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function for importing SavedModels in Tensorflow 2.0.
W0718 10:18:55.271977 140274532570944 deprecation.py:323] From /home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
2019-07-18 10:18:56.953677: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set.  If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU.  To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
2019-07-18 10:18:56.979903: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
Result for output key probs:
[[0.14920072 0.07349582 0.12342736 0.12342736]]

1 Ответ

0 голосов
/ 18 июля 2019

Если я правильно понимаю, вы сохраняете модель в пути, /home/abhilash/1. Таким образом, имя модели в команде curl должно быть abhilash вместо 1, поскольку мы не должны включать номера версий.

Синтаксис команды Curl, которую вы используете, неверен, по крайней мере, согласно документации, показанной в этой ссылке, https://www.tensorflow.org/tfx/serving/docker.

Это должно быть примерно так, как показано ниже:

curl -d '{"signature_name": "predict_ad_view_prob", "inputs":{"res_to_predict_str": ["9 18788418 19039855 18771619"], "click_hist_str": ["18198449 18656271 18198449"]}}' -X POST http://localhost:8501/v1/models/abhilash:predict

Если это не сработает, пожалуйста, поделитесь Команды saved_model_cli show, docker run и saved_model_cli run, которые вы используете, вместе с соответствующими результатами, чтобы мы могли понять, где именно проблема.

Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...