Рассмотрим следующие две модели:
from tensorflow.python.keras.layers import Input, GRU, Dense, TimeDistributed
from tensorflow.python.keras.models import Model
inputs = Input(batch_shape=(None, None, 100))
gru_out = GRU(32, return_sequences=True)(inputs)
dense = Dense(200, activation='softmax')
decoder_pred = TimeDistributed(dense)(gru_out)
model = Model(inputs=inputs, outputs=decoder_pred)
model.summary()
с выводом:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, None, 100) 0
_________________________________________________________________
gru (GRU) (None, None, 32) 12768
_________________________________________________________________
time_distributed (TimeDistri (None, None, 200) 6600
=================================================================
Total params: 19,368
Trainable params: 19,368
Non-trainable params: 0
_________________________________________________________________
И вторая модель:
from tensorflow.python.keras.layers import Input, GRU, Dense
from tensorflow.python.keras.models import Model
inputs = Input(batch_shape=(None, None, 100))
gru_out = GRU(32, return_sequences=True)(inputs)
decoder_pred = Dense(200, activation='softmax')(gru_out)
model = Model(inputs=inputs, outputs=decoder_pred)
model.summary()
с выводом:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) (None, None, 100) 0
_________________________________________________________________
gru_1 (GRU) (None, None, 32) 12768
_________________________________________________________________
dense_1 (Dense) (None, None, 200) 6600
=================================================================
Total params: 19,368
Trainable params: 19,368
Non-trainable params: 0
_________________________________________________________________
Мой вопрос заключается в том, делает ли обертка слоя TimeDistributed
что-то для первой модели?Отличаются ли эти два в каком-либо аспекте (учитывая, что их общее количество параметров одинаково)?