ResourceExhaustedError при попытке определить последовательную модель в Keras в менеджере контекста tenorflow XLA jit_scope - PullRequest
0 голосов
/ 30 сентября 2018

Итак, моя проблема в том, что я пытаюсь использовать XLA для ЦПУ через Keras, который встроен в TensorFlow 1.8 с использованием tf.contrib.compiler.jit.experimental_jit_scope (для ЦП я знаю единственный способ включить XLA, использование ConfigProto не работаетна CPU для меня).По какой-то странной причине меня выкидывает ResourceExhaustedError при попытке выделить 0 байтов.Похоже, что-то не так, либо в TensorFlow или Keras.Может кто-нибудь помочь мне заставить его работать?

Моя настройка:

  • Intel i5-2430M, 6 ГБ ОЗУ
  • Linux Ubuntu 16.04 |Python 3.6.5 |Базель 0.16.0 |GCC 5.4.0
  • TensorFlow 1.8, скомпилированный из исходного кода с поддержкой MKL и XLA

Я открыл проблему на странице TensorFlow GitHub, здесь , но мне сказали, чтоэто не похоже на ошибку и не спрашивает о StackOverflow, поэтому я здесь.

Ниже приведен список кода, который я использую, и полная трассировка.

Код

import tensorflow as tf
from tensorflow.python.client import timeline

import numpy as np

JIT_SCOPE = tf.contrib.compiler.jit.experimental_jit_scope

options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)                   
run_metadata = tf.RunMetadata()

(train_x, train_y), _ = tf.keras.datasets.mnist.load_data()

train_x = np.expand_dims(train_x, axis=-1) / 255.
train_y = tf.keras.utils.to_categorical(train_y)

with JIT_SCOPE():                                                               
    model = tf.keras.models.Sequential([
        tf.keras.layers.Conv2D(16, (3, 3), activation="relu", input_shape=(28, 28, 1)),
        tf.keras.layers.MaxPool2D((2, 2)),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(10, activation="softmax")
    ])

    model.compile("sgd", "categorical_crossentropy", options=options, run_metadata=run_metadata)

model.fit(train_x, train_y) # error happens at this moment

trace = timeline.Timeline(step_stats=run_metadata.step_stats)
with open("timeline.ctr.json", "w") as f:
    f.write(trace.generate_chrome_trace_format())

Traceback

Epoch 1/1
2018-08-15 20:28:54.784459: I tensorflow/compiler/xla/service/service.cc:159] XLA service 0x7f70ec071a30 executing computations on platform Host. Devices:
2018-08-15 20:28:54.784509: I tensorflow/compiler/xla/service/service.cc:167]   StreamExecutor device (0): <undefined>, <undefined>
2018-08-15 20:28:55.548381: E tensorflow/core/common_runtime/bfc_allocator.cc:246] tried to allocate 0 bytes
2018-08-15 20:28:55.548481: W tensorflow/core/common_runtime/allocator_retry.cc:32] Request to allocate 0 bytes
2018-08-15 20:28:55.561315: E tensorflow/core/common_runtime/bfc_allocator.cc:246] tried to allocate 0 bytes
2018-08-15 20:28:55.561365: W tensorflow/core/common_runtime/allocator_retry.cc:32] Request to allocate 0 bytes
---------------------------------------------------------------------------
ResourceExhaustedError                    Traceback (most recent call last)
~/Work/2018_Summer_CERN/tf_v_tmva/tf_opt/.venv/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1321     try:
-> 1322       return fn(*args)
   1323     except errors.OpError as e:

~/Work/2018_Summer_CERN/tf_v_tmva/tf_opt/.venv/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run_fn(feed_dict, fetch_list, target_list, options, run_metadata)
   1306       return self._call_tf_sessionrun(
-> 1307           options, feed_dict, fetch_list, target_list, run_metadata)
   1308 

~/Work/2018_Summer_CERN/tf_v_tmva/tf_opt/.venv/lib/python3.6/site-packages/tensorflow/python/client/session.py in _call_tf_sessionrun(self, options, feed_dict, fetch_list, target_list, run_metadata)
   1408           self._session, options, feed_dict, fetch_list, target_list,
-> 1409           run_metadata)
   1410     else:

ResourceExhaustedError: Out of memory while trying to allocate 0 bytes.
     [[Node: cluster_1/_4/_5 = _XlaLaunch[Nresources=0, Targs=[], Tconstants=[], Tresults=[DT_FLOAT], function=cluster_1[_XlaCompiledKernel=true, _XlaNumConstantArgs=0, _XlaNumResourceArgs=0], _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.


During handling of the above exception, another exception occurred:

ResourceExhaustedError                    Traceback (most recent call last)
<ipython-input-46-dbab7a29ab1f> in <module>()
----> 1 model.fit(train_x[:1000], train_y[:1000], epochs=1)

~/Work/2018_Summer_CERN/tf_v_tmva/tf_opt/.venv/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, **kwargs)
   1214           initial_epoch=initial_epoch,
   1215           steps_per_epoch=steps_per_epoch,
-> 1216           validation_steps=validation_steps)
   1217 
   1218   def evaluate(self,

~/Work/2018_Summer_CERN/tf_v_tmva/tf_opt/.venv/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/engine/training_arrays.py in fit_loop(model, inputs, targets, sample_weights, batch_size, epochs, verbose, callbacks, val_inputs, val_targets, val_sample_weights, shuffle, callback_metrics, initial_epoch, steps_per_epoch, validation_steps)
    243           ins_batch[i] = ins_batch[i].toarray()
    244 
--> 245         outs = f(ins_batch)
    246         if not isinstance(outs, list):
    247           outs = [outs]

~/Work/2018_Summer_CERN/tf_v_tmva/tf_opt/.venv/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/backend.py in __call__(self, inputs)
   2797       feed_dict = {}
   2798 
-> 2799     session = get_session()
   2800     data_tensors_to_feed = []
   2801     for tensor, value in zip(self.inputs, inputs):

~/Work/2018_Summer_CERN/tf_v_tmva/tf_opt/.venv/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/backend.py in get_session()
    440   if not _MANUAL_VAR_INIT:
    441     with session.graph.as_default():
--> 442       _initialize_variables(session)
    443   return session
    444 

~/Work/2018_Summer_CERN/tf_v_tmva/tf_opt/.venv/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/backend.py in _initialize_variables(session)
    671       v._keras_initialized = True
    672     if uninitialized_vars:
--> 673       session.run(variables_module.variables_initializer(uninitialized_vars))
    674 
    675 

~/Work/2018_Summer_CERN/tf_v_tmva/tf_opt/.venv/lib/python3.6/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    898     try:
    899       result = self._run(None, fetches, feed_dict, options_ptr,
--> 900                          run_metadata_ptr)
    901       if run_metadata:
    902         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

~/Work/2018_Summer_CERN/tf_v_tmva/tf_opt/.venv/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1133     if final_fetches or final_targets or (handle and feed_dict_tensor):
   1134       results = self._do_run(handle, final_targets, final_fetches,
-> 1135                              feed_dict_tensor, options, run_metadata)
   1136     else:
   1137       results = []

~/Work/2018_Summer_CERN/tf_v_tmva/tf_opt/.venv/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1314     if handle is None:
   1315       return self._do_call(_run_fn, feeds, fetches, targets, options,
-> 1316                            run_metadata)
   1317     else:
   1318       return self._do_call(_prun_fn, handle, feeds, fetches)

~/Work/2018_Summer_CERN/tf_v_tmva/tf_opt/.venv/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1333         except KeyError:
   1334           pass
-> 1335       raise type(e)(node_def, op, message)
   1336 
   1337   def _extend_graph(self):

ResourceExhaustedError: Out of memory while trying to allocate 0 bytes.
     [[Node: cluster_1/_4/_5 = _XlaLaunch[Nresources=0, Targs=[], Tconstants=[], Tresults=[DT_FLOAT], function=cluster_1[_XlaCompiledKernel=true, _XlaNumConstantArgs=0, _XlaNumResourceArgs=0], _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...