Question

Я работаю над системой подписи изображений в python, используя Keras, и при использовании поиска argmax я получаю разумные результаты (~ 0.58 баллов Bleu_1 и предложения довольно разнообразны).

Однако, когда я пытаюсь найти луч, я получаю почти одинаковое предложение для каждого изображения.

У меня есть следующий код для генерации титров:

# create an array of captions for a chunk of images; first token
# of each caption is the start token
test_x = np.zeros((chunk_size, self.max_len - 1), dtype=np.int)
test_x[:, 0] = self.start_idx + 1

# probability of each caption is 1
captions_probs = np.ones(chunk_size)

# for every image, maintain a heap with the best captions 
self.best_captions = [FixedCapacityMaxHeap(20) for i in range(chunk_size)]

# call beam search using the current cnn features
self.beam_search(cnn_feats, test_x, captions_probs, 0, beam_size)

Метод поиска луча следующий:

def beam_search(self, cnn_feats, generated_captions, captions_probs, t, beam_size):
    # base case: the generated captions have max_len length, so
    # we can remove the (zero) pad at the end and for each image
    # we can insert the generated caption and its probablity into
    # the heap with the best captions
    if t == self.max_len - 1:
        for i in range(len(generated_captions)):
            caption = self.remove_zero_pad(list(generated_captions[i]))
            self.best_captions[i].push(list(caption), captions_probs[i])
    else:
        # otherwise, make a prediction (we only keep the element at time 
        # step t + 1, as the LSTM has a many-to-many architecture, but we
        # are only interested in the next token (for each image).
        pred = self.model.predict(x=[cnn_feats, generated_captions], 
                              batch_size=128,
                              verbose=1)[:, t + 1, :]

        # efficiently get the indices of the tokens with the greatest probability 
        # for each image (they are not necessarily sorted)
        top_idx = np.argpartition(-pred, range(beam_size), axis=1)[:, :beam_size]

        # store the probability of those tokens
        top_probs = pred[np.arange(top_idx.shape[0])[:, None], top_idx]

        # for every 'neighbour' (set of newly generated tokens for every image)
        # get the indices of these tokens, add them to the current captions and 
        # update the captions probabilities by multiplying them with the probabilities
        # of the current tokens, then recursively call beam_search
        for i in range(beam_size):
            curr_idx = top_idx[:, i]
            generated_captions[:, t + 1] = curr_idx
            curr_captions_probs = top_probs[:, i] * captions_probs
            self.beam_search(cnn_feats, generated_captions, curr_captions_probs, t+1, beam_size)

Используемый мной FixedCapacityHeap:

class FixedCapacityMaxHeap(object):

    def __init__(self, capacity):
        self.capacity = capacity
        self.h = []

    def push(self, value, priority):
        if len(self.h) < self.capacity:
            heapq.heappush(self.h, (priority, value))
        else:
            heapq.heappushpop(self.h, (priority, value))

    def pop(self):
        if len(self.h) >= 0:
            return heapq.nlargest(1, self.h)[0]
        else:
            return None

Проблема заключается в том, что подписи, сгенерированные с помощью поиска луча, практически одинаковы для каждого изображения (например: «масштабирование входного значения», «масштабирование входного значения», «масштабирование входного значения»'), в то время как версия argmax (просто берущая токен с наибольшей вероятностью на каждом временном шаге) способна действительно создавать хорошие заголовки.Я застрял на этом довольно долго.Я пробовал другую реализацию (вычисление заголовка для каждого изображения с помощью вызова beam_seach вместо вычисления всех их сразу), и я также экспериментировал с температурным параметром softmax (который отвечает за то, насколько LSTM уверен в своих предсказаниях), но ни один из них, похоже, не решает проблему, поэтому любая идея приветствуется.

Mehdi · Answer 1 · 14 июня 2018

Я сделал эту реализацию очень давно, но я надеюсь, что это поможет.Это не рекурсивно:

https://github.com/mmehdig/lm_beam_search/blob/master/beam_search.py

def search(model, src_input, k=1, sequence_max_len=25):
    # (log(1), initialize_of_zeros)
    k_beam = [(0, [0]*(sequence_max_len+1))]

    # l : point on target sentence to predict
    for l in range(sequence_max_len):
        all_k_beams = []
        for prob, sent_predict in k_beam:
            predicted = model.predict([np.array([src_input]), np.array([sent_predict])])[0]
            # top k!
            possible_k = predicted[l].argsort()[-k:][::-1]

            # add to all possible candidates for k-beams
            all_k_beams += [
                (
                    sum(np.log(predicted[i][sent_predict[i+1]]) for i in range(l)) + np.log(predicted[l][next_wid]),
                    list(sent_predict[:l+1])+[next_wid]+[0]*(sequence_max_len-l-1)
                )
                for next_wid in possible_k
            ]

        # top k
        k_beam = sorted(all_k_beams)[-k:]

    return k_beam

Поиск пучка Python для модели Keras LSTM, генерирующей ту же последовательность

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

1 Ответ

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Поиск пучка Python для модели Keras LSTM, генерирующей ту же последовательность

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

1 Ответ

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Похожие темы