Я пытаюсь создать простой генератор текста (имени), используя RNN. Я прекрасно создаю модель, но когда я пытаюсь предсказать значения, я всегда получаю одну и ту же букву.
Мой код выглядит следующим образом:
from tensorflow.keras.activations import softmax
from tensorflow.keras.losses import categorical_crossentropy
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense
# parameters
LSTM_NODES = 100
MAX_NAME_LEN = 30
STOP_MARKER = '.'
# hyper-parameters
EPOCHS = 10
# read _names.train into an array
names = open('names.train', encoding='utf-8').read().strip().split('\n')
# precompute the number of samples
SAMPLES = 0
for name in names:
for _ in name:
SAMPLES = SAMPLES + 1
# get a sorted list of all unique characters used
corpus = sorted(list({l for name in names for l in name}))
# the first letter in the corpus must be the stop indicator
corpus.insert(0, STOP_MARKER)
# write out the corpus so that the predict script can use it
open('corpus.txt', 'w').write('\n'.join(corpus))
# calculate the input shape for the network
input_shape = (MAX_NAME_LEN, len(corpus))
# Creating a mapping from unique characters to indices
char2idx = {u:i for i, u in enumerate(corpus)}
idx2char = np.array(corpus)
def get_text(sample):
t = ''
for x in sample:
n = idx2char[np.argmax(x)]
t = t + n
return t
# I need a 3-D array, samples x character position x character one-hot encoded
X = np.zeros((SAMPLES, MAX_NAME_LEN, len(corpus)), int)
Y = np.zeros((SAMPLES, len(corpus)), int)
# for each sample name
for name in names:
# number of samples for this name is equal to the number of letters (we add one letter per loop)
for i in range(len(name)):
j = 0
# create one sample
while j <= i:
one_hot_letter = np.zeros(len(corpus), int)
one_hot_letter[char2idx[name[j]]] = 1
X[i, j] = one_hot_letter
j = j + 1
# get the next character in the sequence
one_hot_next = np.zeros(len(corpus), int)
if j < len(name):
one_hot_next[char2idx[name[j]]] = 1
# add this character to the Y sample
Y[i] = one_hot_next
# print this sample
print('X={} Y={}'.format(get_text(X[i]), idx2char[np.argmax(Y[i])]))
# build the model
model = Sequential()
model.add(LSTM(LSTM_NODES, input_shape=input_shape))
model.add(Dense(input_shape[1], activation=softmax))
model.compile(loss=categorical_crossentropy, optimizer='adam')
model.summary()
# train the model
model.fit(X, Y, epochs=EPOCHS)
# save the model
model.save('model.h5')
# try a sample prediction
# first letter is the seed
SEED = 'M'
name = SEED
x = np.zeros((1, input_shape[0], input_shape[1]), int)
one_hot_letter = np.zeros(len(corpus), int)
one_hot_letter[char2idx[SEED]] = 1
x[0, 0] = one_hot_letter
for i in range(1, MAX_NAME_LEN):
predictions = model.predict(x)
# get the next letter and add it to the prediction
next_letter = np.zeros(input_shape[1], int)
next_letter[np.argmax(predictions[0])] = 1
x[0, i] = next_letter
name = name + idx2char[np.argmax(next_letter)]
print(name)
В конце выводится:
Mww
Mwww
Mwwww
Mwwwww
Mwwwwww
Mwwwwwww
Mwwwwwwww
Mwwwwwwwww
Mwwwwwwwwww
Mwwwwwwwwwww
Mwwwwwwwwwwww
Mwwwwwwwwwwwww
Mwwwwwwwwwwwwww
Mwwwwwwwwwwwwwww
Mwwwwwwwwwwwwwwww
Mwwwwwwwwwwwwwwwww
Mwwwwwwwwwwwwwwwwww
Mwwwwwwwwwwwwwwwwwww
Mwwwwwwwwwwwwwwwwwwww
Mwwwwwwwwwwwwwwwwwwwww
Mwwwwwwwwwwwwwwwwwwwwww
Mwwwwwwwwwwwwwwwwwwwwwww
Mwwwwwwwwwwwwwwwwwwwwwwww
Mwwwwwwwwwwwwwwwwwwwwwwwww
Mwwwwwwwwwwwwwwwwwwwwwwwwww
Mwwwwwwwwwwwwwwwwwwwwwwwwwww
Mwwwwwwwwwwwwwwwwwwwwwwwwwwww
Mwwwwwwwwwwwwwwwwwwwwwwwwwwwww
Есть идеи, что может быть не так? Мои образцы в порядке, я думаю. Я использовал их в другом примере, написанном кем-то другим, и они дали разные результаты. У меня 280 образцов. Вот как выглядит names.train
:
Adaldrida
Celendine
Gloriana
Pimpernel
Tanta
Alfrida
Cora
Goldilocks
Melba
Полный результат обучения:
[snip]
X=Valde......................... Y=m
X=Valdem........................ Y=a
X=Valdema....................... Y=r
X=Valdemar...................... Y=.
2020-03-09 13:38:26.827190: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-03-09 13:38:26.843439: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fa8f211d590 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-03-09 13:38:26.843450: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm (LSTM) (None, 100) 58800
_________________________________________________________________
dense (Dense) (None, 46) 4646
=================================================================
Total params: 63,446
Trainable params: 63,446
Non-trainable params: 0
_________________________________________________________________
Train on 1795 samples
Epoch 1/10
1795/1795 [==============================] - 2s 1ms/sample - loss: 0.0168
Epoch 2/10
1795/1795 [==============================] - 1s 462us/sample - loss: 0.0167
Epoch 3/10
1795/1795 [==============================] - 1s 445us/sample - loss: 0.0164
Epoch 4/10
1795/1795 [==============================] - 1s 450us/sample - loss: 0.0163
Epoch 5/10
1795/1795 [==============================] - 1s 449us/sample - loss: 0.0162
Epoch 6/10
1795/1795 [==============================] - 1s 453us/sample - loss: 0.0160
Epoch 7/10
1795/1795 [==============================] - 1s 593us/sample - loss: 0.0159
Epoch 8/10
1795/1795 [==============================] - 1s 599us/sample - loss: 0.0160
Epoch 9/10
1795/1795 [==============================] - 1s 442us/sample - loss: 0.0160
Epoch 10/10
1795/1795 [==============================] - 1s 440us/sample - loss: 0.0160
Mw
Mww
Mwww
Mwwww
Mwwwww
Mwwwwww
Mwwwwwww
Mwwwwwwww
Mwwwwwwwww
Mwwwwwwwwww
Mwwwwwwwwwww
[snip]```