tenorflow. python .framework.errors_impl.InvalidArgumentError: исключение узла при отключении активного режима - PullRequest
0 голосов
/ 03 августа 2020

Я написал модель, которая наследуется от tf.keras.models.Model и перегружает ее метод call (). Когда я запускаю его в нетерпеливом режиме, все работает нормально. Теперь я пытаюсь повысить производительность (длинные прогоны с разными параметрами) в режиме без ожидания, но у меня возникает эта ошибка.

Работа с TF 2.3 на 64-разрядной версии Ubuntu 18.04.4 LTS

2020-08-03 11:15:49.852459: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-08-03 11:15:49.864526: E tensorflow/stream_executor/cuda/cuda_driver.cc:314] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2020-08-03 11:15:49.864590: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (tau): /proc/driver/nvidia/version does not exist
2020-08-03 11:15:49.864990: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-08-03 11:15:49.876833: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2904005000 Hz
2020-08-03 11:15:49.877129: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5725890 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-08-03 11:15:49.877147: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-08-03 11:15:50.836795: W tensorflow/c/c_api.cc:326] Operation '{name:'train_full_model_cell/StatefulPartitionedCall' id:189 op device:{} def:{{{node train_full_model_cell/StatefulPartitionedCall}} = StatefulPartitionedCall[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_BOOL, ..., DT_RESOURCE, DT_RESOURCE, DT_RESOURCE, DT_RESOURCE, DT_RESOURCE], Tout=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _collective_manager_ids=[], _read_only_resource_inputs=[5, 6, 7, 8, 9, 10, 11, 12, 13, 14], config="", config_proto="\n\007\n\003CPU\020\001\n\007\n\003GPU\020\0002\002J\0008\001\202\001\000", executor_type="", f=__forward_call_1504[]](input_1, input_2, input_3, input_4, keras_learning_phase, emb_net_hid_0/kernel, emb_net_hid_0/bias, emb_net_hid_1/kernel, emb_net_hid_1/bias, base_net_hid_0/kernel, base_net_hid_0/bias, base_net_hid_1/kernel, base_net_hid_1/bias, Output_Layer/kernel, Output_Layer/bias)}}' was changed by setting attribute after it was run by a session. This mutation will have no effect, and will trigger an error in the future. Either don't modify nodes after running them or create a new session.
Traceback (most recent call last):
  File "/home/user/venvs/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/home/user/venvs/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _run_fn
    self._extend_graph()
  File "/home/user/venvs/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1388, in _extend_graph
    tf_session.ExtendSession(self._session)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Node 'training/Adam/gradients/gradients/train_full_model_cell/StatefulPartitionedCall_grad/PartitionedCall': Connecting to invalid output 1 of source node train_full_model_cell/StatefulPartitionedCall which has 1 outputs.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/mnt/Projects/ForcesNN/train_full_model_cell.py", line 205, in <module>
    main(**vars(arguments))
  File "/mnt/Projects/ForcesNN/train_full_model_cell.py", line 173, in main
    callbacks=callbacks)
  File "/home/user/venvs/venv/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_v1.py", line 809, in fit
    use_multiprocessing=use_multiprocessing)
  File "/home/user/venvs/venv/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 666, in fit
    steps_name='steps_per_epoch')
  File "/home/user/venvs/venv/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 206, in model_iteration
    val_iterator = _get_iterator(val_inputs, model._distribution_strategy)
  File "/home/user/venvs/venv/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 542, in _get_iterator
    return training_utils.get_iterator(inputs)
  File "/home/user/venvs/venv/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_utils.py", line 1715, in get_iterator
    initialize_iterator(iterator)
  File "/home/user/venvs/venv/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_utils.py", line 1722, in initialize_iterator
    K.get_session((init_op,)).run(init_op)
  File "/home/user/venvs/venv/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 630, in get_session
    _initialize_variables(session)
  File "/home/user/venvs/venv/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 1053, in _initialize_variables
    [variables_module.is_variable_initialized(v) for v in candidate_vars])
  File "/home/user/venvs/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 958, in run
    run_metadata_ptr)
  File "/home/user/venvs/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1181, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/user/venvs/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1359, in _do_run
    run_metadata)
  File "/home/user/venvs/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Node 'training/Adam/gradients/gradients/train_full_model_cell/StatefulPartitionedCall_grad/PartitionedCall': Connecting to invalid output 1 of source node train_full_model_cell/StatefulPartitionedCall which has 1 outputs.

Process finished with exit code 1

Это часть соответствующего кода:

class Model(tf.keras.models.Model):
    def __init__(self, **parameters):
        super(Model, self).__init__()
       self.parameters = parameters  # Save model parameters
class TrainFullModelCell(Model):
    def __init__(self, feature_list, k, r_cs, r_c, base_dim_l, base_act_l, g_dim_l, g_act_l):
        super(TrainFullModelCell, self).__init__(feature_list=feature_list, k=k, r_cs=r_cs, r_c=r_c,
                                                 base_dim_l=base_dim_l, base_act_l=base_act_l,
                                                 g_dim_l=g_dim_l, g_act_l=g_act_l)

        self.G = self._embedding_net(k, g_dim_l, g_act_l)
        self.base_net = self._base_net(len(feature_list) * g_dim_l[-1], base_dim_l, base_act_l)

    def dataset_preprocess(self, ds, with_rotation_aug):
        """
        Runs all pre-processing mapping on dataset
        :param ds: Tensorflow dataset
        :param with_rotation_aug: Apply rotation augmentation of samples
        :return: Preprocessed dataset
        """
        # Generate parameter-dependent preprocess mapping functions
        find_neighbours_in_radius = preprocess.generate_find_neighbours_in_radius(self.parameters["r_c"])
        trim_neighbours = preprocess.generate_trim_neighbours(self.parameters["k"])
        neighbour_info_to_features = preprocess.generate_neighbour_info_to_features(self.parameters["feature_list"],
                                                                                    self.parameters["r_cs"],
                                                                                    self.parameters["r_c"])

        # Preprocess
        if with_rotation_aug:
            ds = ds.map(preprocess.remove_mean_of_pos_vectors)
            ds = ds.map(preprocess.rotation_augmentation)
        ds = ds.map(find_neighbours_in_radius)
        ds = ds.map(trim_neighbours)
        ds = ds.map(neighbour_info_to_features)

        # Create a zip of two dataset: Inputs and Labels
        ds = ds.flat_map(lambda *args: zip(tf.data.Dataset.from_tensor_slices(args[:-1]),
                                           tf.data.Dataset.from_tensor_slices(args[-1])))
        return ds

    @tf.function
    def call(self, inputs, training=None, mask=None):
        X, s_r, _, _ = inputs
        s_r = tf.expand_dims(s_r, axis=-1)
        X = tf.transpose(self.G(s_r), [0, 2, 1]) @ X
        X = tf.keras.layers.Flatten()(X)
        return self.base_net(X)

    @staticmethod
    def _embedding_net(input_size, layer_dim_list, layer_act_list):
        """
        Create embedding network.
        We can't use keras embedding layer because we are not dealing with integers.
        Instead we use Conv1D layer with filter_size as output_dim and kernel_size 1.
        This way we get an output tensor with shape (batch_size, input_size, filter_size).
        To really make it an embedding layer, we need to transpose the resulted matrix afterwards.
        :param layer_dim_list: List of layer dimensions
        :param layer_act_list: List of layer activation functions
        :return: Embedding network
        """
        emb_net = tf.keras.Sequential(name="Embedding_Net")
        emb_net.add(tf.keras.layers.Input((int(input_size), 1), name='emb_net_input'))
        for i, (layer_i_dim, layer_i_act) in enumerate(zip(layer_dim_list, layer_act_list)):
            layer_i_act = tf.compat.as_text(layer_i_act)
            emb_net.add(tf.keras.layers.Conv1D(layer_i_dim, 1, activation=layer_i_act, name='emb_net_hid_{}'.format(i)))

        return emb_net

    @staticmethod
    def _base_net(input_size, layer_dim_list, layer_act_list):
        """
        Create base-net
        :param layer_dim_list: List of layer dimensions
        :param layer_act_list: List of layer activation functions
        :return: Sequential model
        """
        base_net = tf.keras.Sequential(name="Base_Net")
        base_net.add(tf.keras.layers.Input(int(input_size), name='base_net_input'))
        for i, (layer_i_dim, layer_i_act) in enumerate(zip(layer_dim_list, layer_act_list)):
            layer_i_act = tf.compat.as_text(layer_i_act)
            base_net.add(tf.keras.layers.Dense(layer_i_dim, activation=layer_i_act, name='base_net_hid_{}'.format(i)))
        base_net.add(tf.keras.layers.Dense(3, activation='linear', name='Output_Layer'))

        return base_net
...