FEQA: A Question Answering Evaluation Framework for Faithfulness Assessment in Abstractive Summarization

About

Neural abstractive summarization models are prone to generate content inconsistent with the source document, i.e. unfaithful. Existing automatic metrics do not capture such mistakes effectively. We tackle the problem of evaluating faithfulness of a generated summary given its source document. We first collected human annotations of faithfulness for outputs from numerous models on two datasets. We find that current models exhibit a trade-off between abstractiveness and faithfulness: outputs with less word overlap with the source document are more likely to be unfaithful. Next, we propose an automatic question answering (QA) based metric for faithfulness, FEQA, which leverages recent advances in reading comprehension. Given question-answer pairs generated from the summary, a QA model extracts answers from the document; non-matched answers indicate unfaithful information in the summary. Among metrics based on word overlap, embedding similarity, and learned language understanding models, our QA-based metric has significantly higher correlation with human faithfulness scores, especially on highly abstractive summaries.

Esin Durmus, He He, Mona Diab• 2020

Related benchmarks

Task	Dataset	Result
Factual Consistency Evaluation	SummaC	CGS53.7	52
Factual Consistency Evaluation	QAGS XSUM	Spearman Correlation-6.5	39
Factual Consistency Evaluation	QAGS CNNDM	Spearman Correlation-7.2	38
Factual Consistency Evaluation	TRUE benchmark	PAWS (AUC-ROC)50	37
Factual Consistency Evaluation	SummEval	Spearman Correlation0.2	36
Factual Consistency Evaluation	QAGS-XSum (test)	Pearson Correlation Coefficient-0.73	35
Factual Consistency Evaluation	FRANK CNNDM	Spearman Correlation-2.9	30
Factual Consistency Evaluation	SamSum	Spearman Correlation0.00e+0	30
Factual Consistency Evaluation	FRANK-XSum (FRK-X)	Spearman Correlation1.5	30
Factual Consistency Evaluation	SamSum (test)	Pearson Correlation Coefficient2.7	22

Showing 10 of 25 rows

Other info

Follow for update

@wizwand_team Discord