Question

Я использую тонко настроенную модель BERT и ALBERT для ответов на вопросы. И я оцениваю производительность этих моделей по подмножеству вопросов из SQuAD v2.0 . Я использую официальный сценарий оценки SQuAD для оценки.

Я использую Huggingface transformers, и в следующем вы можете найти реальный код и пример, который я запускаю (может быть также полезно для некоторых людей, которые пытаются запустить точно настроенную модель ALBERT на SQuAD v2 .0):

tokenizer = AutoTokenizer.from_pretrained("ktrapeznikov/albert-xlarge-v2-squad-v2")
model = AutoModelForQuestionAnswering.from_pretrained("ktrapeznikov/albert-xlarge-v2-squad-v2")

question = "Why aren't the examples of bouregois architecture visible today?"
text = """Exceptional examples of the bourgeois architecture of the later periods were not restored by the communist authorities after the war (like mentioned Kronenberg Palace and Insurance Company Rosja building) or they were rebuilt in socialist realism style (like Warsaw Philharmony edifice originally inspired by Palais Garnier in Paris). Despite that the Warsaw University of Technology building (1899\u20131902) is the most interesting of the late 19th-century architecture. Some 19th-century buildings in the Praga district (the Vistula\u2019s right bank) have been restored although many have been poorly maintained. Warsaw\u2019s municipal government authorities have decided to rebuild the Saxon Palace and the Br\u00fchl Palace, the most distinctive buildings in prewar Warsaw."""

input_dict = tokenizer.encode_plus(question, text, return_tensors="pt")
input_ids = input_dict["input_ids"].tolist()
start_scores, end_scores = model(**input_dict)

all_tokens = tokenizer.convert_ids_to_tokens(input_ids[0])
answer = ' '.join(all_tokens[torch.argmax(start_scores) : torch.argmax(end_scores)+1]).replace('▁', '')
print(answer)

И результат выглядит следующим образом:

[CLS] why aren ' t the examples of bour ego is architecture visible today ? [SEP] exceptional examples of the  bourgeois architecture of the later periods were not restored by the communist authorities after the war

Как видите, в ответе есть специальные токены BERT, включая [CLS] и [SEP] .

Я понимаю, что в случаях, когда ответ просто [CLS] (с двумя tensor(0) для start_scores и end_scores), это в основном означает, что модель думает, что нет ответа на вопрос в контексте, что делает смысл. И в этих случаях я просто устанавливаю ответ на этот вопрос в пустую строку при запуске сценария оценки.

Но Интересно, в случаях, подобных примеру выше, должен ли я снова предположить, что модель не может найти ответ и установить ответ на пустую строку? или мне просто оставить такой ответ, когда я оцениваю производительность модели?

Я задаю этот вопрос, потому что, насколько я понимаю, производительность, рассчитанная с помощью сценария оценки, может измениться (поправьте меня, если Я не прав), если у меня есть такие случаи, как ответы, и я не могу получить реалистичное представление о производительности этих моделей.

cronoik · Answer 1 · 14 февраля 2020

Вам следует просто считать их недействительными, потому что вы пытаетесь предсказать правильный диапазон ответа из переменной text. Все остальное должно быть недействительным. Это также способ, которым huggingface обрабатывает следующие прогнозы:

Мы могли бы гипотетически создать неверные прогнозы, например, предсказать, что речь идет о начале диапазона. Мы отбрасываем все недопустимые прогнозы.

Вы также должны заметить, что они используют более сложный метод , чтобы получить прогнозы для каждого вопроса (не спрашивайте меня, почему они показывают факел .argmax в их примере). Пожалуйста, посмотрите на пример ниже:

from transformers.data.processors.squad import SquadResult, SquadExample, SquadFeatures,SquadV2Processor, squad_convert_examples_to_features
from transformers.data.metrics.squad_metrics import compute_predictions_logits, squad_evaluate

###
#your example code
###

outputs = model(**input_dict)

def to_list(tensor):
    return tensor.detach().cpu().tolist()

output = [to_list(output[0]) for output in outputs]
start_logits, end_logits = output

all_results = []
all_results.append(SquadResult(1000000000, start_logits, end_logits))

#this is the answers section from the evaluation dataset
answers = [{'text':'not restored by the communist authorities', 'answer_start':77}, {'text':'were not restored', 'answer_start':72}, {'text':'not restored by the communist authorities after the war', 'answer_start':77}]

examples = [SquadExample('0', question, text, 'not restored by the communist authorities', 75, 'Warsaw', answers,False)]

#this does basically the same as tokenizer.encode_plus() but stores them in a SquadFeatures Object and splits if neccessary
features = squad_convert_examples_to_features(examples, tokenizer, 512, 100, 64, True)

predictions = compute_predictions_logits(
            examples,
            features,
            all_results,
            20,
            30,
            True,
            'pred.file',
            'nbest_file',
            'null_log_odds_file',
            False,
            True,
            0.0,
            tokenizer
            )

result = squad_evaluate(examples, predictions)

print(predictions)
for x in result.items():
  print(x)

Вывод:

OrderedDict([('0', 'communist authorities after the war')])
('exact', 0.0)
('f1', 72.72727272727273)
('total', 1)
('HasAns_exact', 0.0)
('HasAns_f1', 72.72727272727273)
('HasAns_total', 1)
('best_exact', 0.0)
('best_exact_thresh', 0.0)
('best_f1', 72.72727272727273)
('best_f1_thresh', 0.0)

Что означает появление специальных символов BERT в ответах QQ на SQuAD?

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

1 Ответ

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Что означает появление специальных символов BERT в ответах QQ на SQuAD?

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

1 Ответ

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Похожие темы