В моей сетевой структуре у меня есть слой класса "re c" с именем "output". Внутри «единицы» этого слоя у меня есть несколько слоев, один из которых «pivot_target_embed_raw». Слой «pivot_target_embed_raw» будет загружен с другой контрольной точки. Теперь я хочу использовать параметры pivot_target_embed_raw и для моего слоя «source_embed_raw», который находится за пределами единицы «output», и вместо этого - слой в моей сети с той же «сетевой глубиной», что и «output». В моем конфиге я сейчас попробовал 2 вещи, обе из которых привели к различным ошибкам: 1. Для параметра 'reuse_params': {'map': {'W': {'reuse_layer': 'pivot_target_embed_raw'}, 'b': None}}
, приводящего к следующей ошибке (публикация части ошибки, потому что я думаю, что простая проблема здесь в том, как вызывается pivot_target_embed_raw, так что наиболее вероятный вид на 2.)
File "/u/hilmes/returnn/TFNetworkLayer.py", line 448, in transform_config_dict
line: for src_name in src_names
locals:
src_name = <not found>
src_names = <local> ['source_embed_raw'], _[0]: {len = 16}
File "/u/hilmes/returnn/TFNetworkLayer.py", line 449, in <listcomp>
line: d["sources"] = [
get_layer(src_name)
for src_name in src_names
if not src_name == "none"]
locals:
d = <not found>
get_layer = <local> <function TFNetwork.construct_layer.<locals>.get_layer at 0x7f781e7a6d90>
src_name = <local> 'source_embed_raw', len = 16
src_names = <not found>
File "/u/hilmes/returnn/TFNetwork.py", line 607, in get_layer
line: return self.construct_layer(net_dict=net_dict, name=src_name) # set get_layer to wrap construct_layer
locals:
self = <local> <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
self.construct_layer = <local> <bound method TFNetwork.construct_layer of <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>>
net_dict = <local> {'dec_03_att_key0': {'from': ['encoder'], 'class': 'linear', 'with_bias': False, 'n_out': 512, 'activation': None, 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', distribution='uniform', scale=1.0)"}, 'enc_06_self_att_lin': {'from': ['enc_06_self_att_att'], 'class': 'linear',..., len = 98
name = <not found>
src_name = <local> 'source_embed_raw', len = 16
File "/u/hilmes/returnn/TFNetwork.py", line 652, in construct_layer
line: layer_class.transform_config_dict(layer_desc, network=self, get_layer=get_layer)
locals:
layer_class = <local> <class 'TFNetworkLayer.LinearLayer'>
layer_class.transform_config_dict = <local> <bound method LayerBase.transform_config_dict of <class 'TFNetworkLayer.LinearLayer'>>
layer_desc = <local> {'reuse_params': {'map': {'W': {'reuse_layer': 'pivot_target_embed_raw'}, 'b': None}}, 'with_bias': False, 'n_out': 512, 'sources': [<SourceLayer 'data:data' out_type=Data(shape=(None,), dtype='int32', sparse=True, dim=35356, batch_shape_meta=[B,T|'time:var:extern_data:data'])>], 'activation': None}
network = <not found>
self = <local> <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
get_layer = <local> <function TFNetwork.construct_layer.<locals>.get_layer at 0x7f781e7a6ea0>
File "/u/hilmes/returnn/TFNetworkLayer.py", line 456, in transform_config_dict
line: d["reuse_params"] = ReuseParams.from_config_dict(d["reuse_params"], network=network, get_layer=get_layer)
locals:
d = <local> {'reuse_params': {'map': {'W': {'reuse_layer': 'pivot_target_embed_raw'}, 'b': None}}, 'with_bias': False, 'n_out': 512, 'sources': [<SourceLayer 'data:data' out_type=Data(shape=(None,), dtype='int32', sparse=True, dim=35356, batch_shape_meta=[B,T|'time:var:extern_data:data'])>], 'activation': None}
ReuseParams = <global> <class 'TFNetworkLayer.ReuseParams'>
ReuseParams.from_config_dict = <global> <bound method ReuseParams.from_config_dict of <class 'TFNetworkLayer.ReuseParams'>>
network = <local> <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
get_layer = <local> <function TFNetwork.construct_layer.<locals>.get_layer at 0x7f781e7a6ea0>
File "/u/hilmes/returnn/TFNetworkLayer.py", line 1386, in from_config_dict
line: value["reuse_layer"] = optional_get_layer(value["reuse_layer"])
locals:
value = <local> {'reuse_layer': 'pivot_target_embed_raw'}
optional_get_layer = <local> <function ReuseParams.from_config_dict.<locals>.optional_get_layer at 0x7f781e7a6f28>
File "/u/hilmes/returnn/TFNetworkLayer.py", line 1362, in optional_get_layer
line: return get_layer(layer_name)
locals:
get_layer = <local> <function TFNetwork.construct_layer.<locals>.get_layer at 0x7f781e7a6ea0>
layer_name = <local> 'pivot_target_embed_raw', len = 22
File "/u/hilmes/returnn/TFNetwork.py", line 607, in get_layer
line: return self.construct_layer(net_dict=net_dict, name=src_name) # set get_layer to wrap construct_layer
locals:
self = <local> <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
self.construct_layer = <local> <bound method TFNetwork.construct_layer of <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>>
net_dict = <local> {'dec_03_att_key0': {'from': ['encoder'], 'class': 'linear', 'with_bias': False, 'n_out': 512, 'activation': None, 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', distribution='uniform', scale=1.0)"}, 'enc_06_self_att_lin': {'from': ['enc_06_self_att_att'], 'class': 'linear',..., len = 98
name = <not found>
src_name = <local> 'pivot_target_embed_raw', len = 22
File "/u/hilmes/returnn/TFNetwork.py", line 643, in construct_layer
line: raise LayerNotFound("layer %r not found in %r" % (name, self))
locals:
LayerNotFound = <global> <class 'TFNetwork.LayerNotFound'>
name = <local> 'pivot_target_embed_raw', len = 22
self = <local> <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
LayerNotFound: layer 'pivot_target_embed_raw' not found in <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
Во второй попытке я изменил код на 'reuse_params': {'map': {'W': {'reuse_layer': 'output/pivot_target_embed_raw'}, 'b': None}}
Снова я получаю действительно длинную трассировку стека, начинающуюся с:
ReuseParams: layer 'output/pivot_target_embed_raw' does not exist yet and there is a dependency loop, thus creating it on dummy inputs now
Exception creating layer root/'source_embed_raw' of class LinearLayer with opts:
{'activation': None,
'n_out': 512,
'name': 'source_embed_raw',
'network': <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>,
'output': Data(name='source_embed_raw_output', shape=(None, 512), batch_shape_meta=[B,T|'time:var:extern_data:data',F|512]),
'reuse_params': <TFNetworkLayer.ReuseParams object at 0x7fcb3e959ac8>,
'sources': [<SourceLayer 'data:data' out_type=Data(shape=(None,), dtype='int32', sparse=True, dim=35356, batch_shape_meta=[B,T|'time:var:extern_data:data'])>],
'with_bias': False}
EXCEPTION
layer root/'source_embed_raw' output: Data(name='source_embed_raw_output', shape=(None, 512), batch_shape_meta=[B,T|'time:var:extern_data:data',F|512])
ReuseParams: layer 'output/pivot_target_embed_raw' does not exist yet and there is a dependency loop, thus creating it on dummy inputs now
Exception creating layer root/'source_embed_raw' of class LinearLayer with opts:
{'activation': None,
'n_out': 512,
'name': 'source_embed_raw',
'network': <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>,
'output': Data(name='source_embed_raw_output', shape=(None, 512), batch_shape_meta=[B,T|'time:var:extern_data:data',F|512]),
'reuse_params': <TFNetworkLayer.ReuseParams object at 0x7fcb3e60e7f0>,
'sources': [<SourceLayer 'data:data' out_type=Data(shape=(None,), dtype='int32', sparse=True, dim=35356, batch_shape_meta=[B,T|'time:var:extern_data:data'])>],
'with_bias': False}
Traceback (most recent call last):
и заканчивающуюся:
File "/u/hilmes/opt/returnn/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 1220, in get_variable
line: return var_store.get_variable(
full_name,
shape=shape,
dtype=dtype,
initializer=initializer,
regularizer=regularizer,
reuse=reuse,
trainable=trainable,
collections=collections,
caching_device=caching_device,
partitioner=partitioner,
validate_shape=validate_shape,
use_resource=use_resource,
custom_getter=custom_getter,
constraint=constraint,
synchronization=synchronization,
aggregation=aggregation)
locals:
var_store = <local> <tensorflow.python.ops.variable_scope._VariableStore object at 0x7fca58cac198>
var_store.get_variable = <local> <bound method _VariableStore.get_variable of <tensorflow.python.ops.variable_scope._VariableStore object at 0x7fca58cac198>>
full_name = <local> 'source_embed_raw/W', len = 18
shape = <local> (35356, 512)
dtype = <local> tf.float32
initializer = <local> <tensorflow.python.ops.init_ops.GlorotUniform object at 0x7fcb3e96a7b8>
regularizer = <local> None
reuse = <local> <_ReuseMode.AUTO_REUSE: 1>
trainable = <local> None
collections = <local> None
caching_device = <local> None
partitioner = <local> None
validate_shape = <local> True
use_resource = <local> None
custom_getter = <local> <function ReuseParams.get_variable_scope.<locals>._variable_custom_getter at 0x7fcb3e9616a8>
constraint = <local> None
synchronization = <local> <VariableSynchronization.AUTO: 0>
aggregation = <local> <VariableAggregation.NONE: 0>
File "/u/hilmes/opt/returnn/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 530, in get_variable
line: return custom_getter(**custom_getter_kwargs)
locals:
custom_getter = <local> <function ReuseParams.get_variable_scope.<locals>._variable_custom_getter at 0x7fcb3e9616a8>
custom_getter_kwargs = <local> {'use_resource': None, 'caching_device': None, 'collections': None, 'shape': (35356, 512), 'initializer': <tensorflow.python.ops.init_ops.GlorotUn
iform object at 0x7fcb3e96a7b8>, 'name': 'source_embed_raw/W', 'synchronization': <VariableSynchronization.AUTO: 0>, 'validate_shape': True, 'getter': ..., len = 16
File "/u/hilmes/returnn/TFNetworkLayer.py", line 1537, in _variable_custom_getter
line: return self.variable_custom_getter(base_layer=base_layer, **kwargs_)
locals:
self = <local> <TFNetworkLayer.ReuseParams object at 0x7fcb3e959ac8>
self.variable_custom_getter = <local> <bound method ReuseParams.variable_custom_getter of <TFNetworkLayer.ReuseParams object at 0x7fcb3e959ac8>>
base_layer = <local> <LinearLayer 'source_embed_raw' out_type=Data(shape=(None, 512), batch_shape_meta=[B,T|'time:var:extern_data:data',F|512])>
kwargs_ = <local> {'aggregation': <VariableAggregation.NONE: 0>, 'partitioner': None, 'caching_device': None, 'use_resource': None, 'getter': <function _VariableStore.get_variable.<locals>._true_getter at 0x7fcb3e961730>, 'name': 'source_embed_raw/W', 'synchronization': <VariableSynchronization.AUTO: 0>, 'validate..., len = 16
File "/u/hilmes/returnn/TFNetworkLayer.py", line 1575, in variable_custom_getter
line: return self.param_map[param_name].variable_custom_getter(
getter=getter, name=name, base_layer=base_layer, **kwargs)
locals:
self = <local> <TFNetworkLayer.ReuseParams object at 0x7fcb3e959ac8>
self.param_map = <local> {'W': <TFNetworkLayer.ReuseParams object at 0x7fcb3e959c18>, 'b': <TFNetworkLayer.ReuseParams object at 0x7fcb3e959dd8>}
param_name = <local> 'W'
variable_custom_getter = <not found>
getter = <local> <function _VariableStore.get_variable.<locals>._true_getter at 0x7fcb3e961730>
name = <local> 'source_embed_raw/W', len = 18
base_layer = <local> <LinearLayer 'source_embed_raw' out_type=Data(shape=(None, 512), batch_shape_meta=[B,T|'time:var:extern_data:data',F|512])>
kwargs = <local> {'partitioner': None, 'caching_device': None, 'use_resource': None, 'dtype': tf.float32, 'synchronization': <VariableSynchronization.AUTO: 0>, 'validate_shape': True, 'initializer': <tensorflow.python.ops.init_ops.GlorotUniform object at 0x7fcb3e96a7b8>, 'regularizer': None, 'constraint': None, '..., len = 14
File "/u/hilmes/returnn/TFNetworkLayer.py", line 1576, in variable_custom_getter
line: if self.reuse_layer:
locals:
self = <local> <TFNetworkLayer.ReuseParams object at 0x7fcb3e959c18>
self.reuse_layer = <local> !KeyError: 'output/pivot_target_embed_raw'
File "/u/hilmes/returnn/TFNetworkLayer.py", line 1495, in reuse_layer
line: self._reuse_layer = self._reuse_layer.get_layer()
locals:
self = <local> <TFNetworkLayer.ReuseParams object at 0x7fcb3e959c18>
self._reuse_layer = <local> <TFNetworkLayer.ReuseParams.LazyLayerResolver object at 0x7fcb3e959b38>
self._reuse_layer.get_layer = <local> <bound method ReuseParams.LazyLayerResolver.get_layer of <TFNetworkLayer.ReuseParams.LazyLayerResolver object at 0x7fcb3e959b38>>
File "/u/hilmes/returnn/TFNetworkLayer.py", line 1424, in get_layer
line: return self.create_dummy_layer(dep_loop_exception=exc)
locals:
self = <local> <TFNetworkLayer.ReuseParams.LazyLayerResolver object at 0x7fcb3e959b38>
self.create_dummy_layer = <local> <bound method ReuseParams.LazyLayerResolver.create_dummy_layer of <TFNetworkLayer.ReuseParams.LazyLayerResolver object at 0x7fcb3e959b38>>
dep_loop_exception = <not found>
exc = <not found>
File "/u/hilmes/returnn/TFNetworkLayer.py", line 1467, in create_dummy_layer
line: layer_desc = dep_loop_exception.net_dict[self.layer_name].copy()
locals:
layer_desc = <not found>
dep_loop_exception = <local> NetworkConstructionDependencyLoopException("Error: There is a dependency loop on layer 'output'.\nConstruction stack (most recent first):\n source_embed_weighted\n source_embed_with_pos\n source_embed\n enc_01_self_att_out\n enc_01_ff_out\n enc_01\n enc_02_self_att_out\n enc_02_ff_out\n ...
dep_loop_exception.net_dict = <local> {'enc_06_self_att_laynorm': {'class': 'layer_norm', 'from': ['enc_05']}, 'source_embed_weighted': {'class': 'eval', 'from': ['source_embed_raw'], 'eval': 'source(0) * 22.627417'}, 'enc_01_ff_drop': {'dropout': 0.1, 'class': 'dropout', 'from': ['enc_01_ff_conv2']}, 'enc_05_ff_drop': {'dropout': 0...., len = 98
self = <local> <TFNetworkLayer.ReuseParams.LazyLayerResolver object at 0x7fcb3e959b38>
self.layer_name = <local> 'output/pivot_target_embed_raw', len = 29
copy = <not found>
KeyError: 'output/pivot_target_embed_raw'
Возможно ли, что функция create_dummy_layer
не может обрабатывать слой, являющийся частью подсети, или я неправильно использую reuse_parameters?
РЕДАКТИРОВАТЬ: сокращенная версия конфигурации:
network = { 'dec_01_att_key': {'axis': 'F', 'class': 'split_dims', 'dims': (8, 64), 'from': ['dec_01_att_key0']},
'dec_01_att_key0': { 'activation': None,
'class': 'linear',
'forward_weights_init': "variance_scaling_initializer(mode='fan_in', distribution='uniform', scale=1.0)",
'from': ['encoder'],
'n_out': 512,
'with_bias': False},
'dec_01_att_value': {'axis': 'F', 'class': 'split_dims', 'dims': (8, 64), 'from': ['dec_01_att_value0']},
'dec_01_att_value0': { 'activation': None,
'class': 'linear',
'forward_weights_init': "variance_scaling_initializer(mode='fan_in', distribution='uniform', scale=1.0)",
'from': ['encoder'],
'n_out': 512,
'with_bias': False},
'decision': {'class': 'decide', 'from': ['output'], 'loss': 'edit_distance', 'loss_opts': {}, 'target': 'classes'},
'enc_01': {'class': 'copy', 'from': ['enc_01_ff_out']},
'enc_01_ff_conv1': { 'activation': 'relu',
'class': 'linear',
'forward_weights_init': "variance_scaling_initializer(mode='fan_in', distribution='uniform', scale=1.0)",
'from': ['enc_01_ff_laynorm'],
'n_out': 2048,
'with_bias': True},
'enc_01_ff_conv2': { 'activation': None,
'class': 'linear',
'dropout': 0.1,
'forward_weights_init': "variance_scaling_initializer(mode='fan_in', distribution='uniform', scale=1.0)",
'from': ['enc_01_ff_conv1'],
'n_out': 512,
'with_bias': True},
'enc_01_ff_drop': {'class': 'dropout', 'dropout': 0.1, 'from': ['enc_01_ff_conv2']},
'enc_01_ff_laynorm': {'class': 'layer_norm', 'from': ['enc_01_self_att_out']},
'enc_01_ff_out': {'class': 'combine', 'from': ['enc_01_self_att_out', 'enc_01_ff_drop'], 'kind': 'add', 'n_out': 512},
'enc_01_self_att_att': { 'attention_dropout': 0.1,
'attention_left_only': False,
'class': 'self_attention',
'forward_weights_init': "variance_scaling_initializer(mode='fan_in', distribution='uniform', scale=1.0)",
'from': ['enc_01_self_att_laynorm'],
'n_out': 512,
'num_heads': 8,
'total_key_dim': 512},
'enc_01_self_att_drop': {'class': 'dropout', 'dropout': 0.1, 'from': ['enc_01_self_att_lin']},
'enc_01_self_att_laynorm': {'class': 'layer_norm', 'from': ['source_embed']},
'enc_01_self_att_lin': { 'activation': None,
'class': 'linear',
'forward_weights_init': "variance_scaling_initializer(mode='fan_in', distribution='uniform', scale=1.0)",
'from': ['enc_01_self_att_att'],
'n_out': 512,
'with_bias': False},
'enc_01_self_att_out': {'class': 'combine', 'from': ['source_embed', 'enc_01_self_att_drop'], 'kind': 'add', 'n_out': 512},
'encoder': {'class': 'layer_norm', 'from': ['enc_01']},
'output': { 'class': 'rec',
'from': [],
'max_seq_len': "max_len_from('base:encoder') * 3",
'target': 'classes',
'unit': { 'dec_01': {'class': 'copy', 'from': ['dec_01_ff_out']},
'dec_01_att0': {'base': 'base:dec_01_att_value', 'class': 'generic_attention', 'weights': 'dec_01_att_weights_drop'},
'dec_01_att_att': {'axes': 'static', 'class': 'merge_dims', 'from': ['dec_01_att0']},
'dec_01_att_drop': {'class': 'dropout', 'dropout': 0.1, 'from': ['dec_01_att_lin']},
'dec_01_att_energy': { 'class': 'dot',
'from': ['base:dec_01_att_key', 'dec_01_att_query'],
'red1': -1,
'red2': -1,
'var1': 'T',
'var2': 'T?'},
'dec_01_att_laynorm': {'class': 'layer_norm', 'from': ['dec_01_self_att_out']},
'dec_01_att_lin': { 'activation': None,
'class': 'linear',
'forward_weights_init': "variance_scaling_initializer(mode='fan_in', distribution='uniform', scale=1.0)",
'from': ['dec_01_att_att'],
'n_out': 512,
'with_bias': False},
'dec_01_att_out': {'class': 'combine', 'from': ['dec_01_self_att_out', 'dec_01_att_drop'], 'kind': 'add', 'n_out': 512},
'dec_01_att_query': {'axis': 'F', 'class': 'split_dims', 'dims': (8, 64), 'from': ['dec_01_att_query0']},
'dec_01_att_query0': { 'activation': None,
'class': 'linear',
'forward_weights_init': "variance_scaling_initializer(mode='fan_in', distribution='uniform', "
'scale=1.0)',
'from': ['dec_01_att_laynorm'],
'n_out': 512,
'with_bias': False},
'dec_01_att_weights': {'class': 'softmax_over_spatial', 'energy_factor': 0.125, 'from': ['dec_01_att_energy']},
'dec_01_att_weights_drop': { 'class': 'dropout',
'dropout': 0.1,
'dropout_noise_shape': {'*': None},
'from': ['dec_01_att_weights']},
'dec_01_ff_conv1': { 'activation': 'relu',
'class': 'linear',
'forward_weights_init': "variance_scaling_initializer(mode='fan_in', distribution='uniform', scale=1.0)",
'from': ['dec_01_ff_laynorm'],
'n_out': 2048,
'with_bias': True},
'dec_01_ff_conv2': { 'activation': None,
'class': 'linear',
'dropout': 0.1,
'forward_weights_init': "variance_scaling_initializer(mode='fan_in', distribution='uniform', scale=1.0)",
'from': ['dec_01_ff_conv1'],
'n_out': 512,
'with_bias': True},
'dec_01_ff_drop': {'class': 'dropout', 'dropout': 0.1, 'from': ['dec_01_ff_conv2']},
'dec_01_ff_laynorm': {'class': 'layer_norm', 'from': ['dec_01_att_out']},
'dec_01_ff_out': {'class': 'combine', 'from': ['dec_01_att_out', 'dec_01_ff_drop'], 'kind': 'add', 'n_out': 512},
'dec_01_self_att_att': { 'attention_dropout': 0.1,
'attention_left_only': True,
'class': 'self_attention',
'forward_weights_init': "variance_scaling_initializer(mode='fan_in', distribution='uniform', "
'scale=1.0)',
'from': ['dec_01_self_att_laynorm'],
'n_out': 512,
'num_heads': 8,
'total_key_dim': 512},
'dec_01_self_att_drop': {'class': 'dropout', 'dropout': 0.1, 'from': ['dec_01_self_att_lin']},
'dec_01_self_att_laynorm': {'class': 'layer_norm', 'from': ['target_embed']},
'dec_01_self_att_lin': { 'activation': None,
'class': 'linear',
'forward_weights_init': "variance_scaling_initializer(mode='fan_in', distribution='uniform', "
'scale=1.0)',
'from': ['dec_01_self_att_att'],
'n_out': 512,
'with_bias': False},
'dec_01_self_att_out': {'class': 'combine', 'from': ['target_embed', 'dec_01_self_att_drop'], 'kind': 'add', 'n_out': 512},
'decoder': {'class': 'layer_norm', 'from': ['dec_01']},
'end': {'class': 'compare', 'from': ['output'], 'value': 0},
'output': {'beam_size': 12, 'class': 'choice', 'from': ['output_prob'], 'initial_output': 0, 'target': 'classes'},
'output_prob': { 'class': 'softmax',
'dropout': 0.0,
'forward_weights_init': "variance_scaling_initializer(mode='fan_in', distribution='uniform', scale=1.0)",
'from': ['decoder'],
'loss': 'ce',
'loss_opts': {'use_normalized_loss': True},
'reuse_params': {'map': {'W': {'custom': None, 'reuse_layer': 'target_embed_raw'}, 'b': None}},
'target': 'classes',
'with_bias': True},
'target_embed': {'class': 'dropout', 'dropout': 0.1, 'from': ['target_embed_with_pos']},
'target_embed_raw': { 'activation': None,
'class': 'linear',
'forward_weights_init': "variance_scaling_initializer(mode='fan_in', distribution='uniform', "
'scale=1.0)',
'from': ['prev:output'],
'n_out': 512,
'with_bias': False},
'target_embed_weighted': {'class': 'eval', 'eval': 'source(0) * 22.627417', 'from': ['target_embed_raw'], 'trainable': False},
'target_embed_with_pos': { 'add_to_input': True,
'class': 'positional_encoding',
'from': ['target_embed_weighted']}},
'pivot_target_embed_raw': { 'activation': None,
'class': 'linear',
'forward_weights_init': "variance_scaling_initializer(mode='fan_in', distribution='uniform', "
'scale=1.0)',
#'from': ['prev:output'],
'n_out': 512,
'trainable': False,
'with_bias': False}
},
'source_embed': {'class': 'dropout', 'dropout': 0.1, 'from': ['source_embed_with_pos']},
'source_embed_raw': { 'activation': None,
'class': 'linear',
#'forward_weights_init': "variance_scaling_initializer(mode='fan_in', distribution='uniform', scale=1.0)",
'from': ['data:data'],
'n_out': 512,
'with_bias': False,
#'reuse_params': {'map': {'W': {'reuse_layer': 'pivot_source_embed_raw'}, 'b': None}},
'reuse_params': {'map': {'W': {'reuse_layer': 'output/pivot_target_embed_raw'}, 'b': None}}
},
'source_embed_weighted': {'class': 'eval', 'eval': 'source(0) * 22.627417', 'from': ['source_embed_raw']},
'source_embed_with_pos': {'add_to_input': True, 'class': 'positional_encoding', 'from': ['source_embed_weighted']}}
pivot_file = [Pathplaceholder]
pivot_prefix = 'pivot_'
preload_from_files = {}
if not task == "search":
preload_from_files = {
"pivot" : {"filename": pivot_file, "prefix": pivot_prefix, "init_for_train": True},
}