Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Evaluating the Factual Consistency of Abstractive Text Summarization

About

Currently used metrics for assessing summarization algorithms do not account for whether summaries are factually consistent with source documents. We propose a weakly-supervised, model-based approach for verifying factual consistency and identifying conflicts between source documents and a generated summary. Training data is generated by applying a series of rule-based transformations to the sentences of source documents. The factual consistency model is then trained jointly for three tasks: 1) identify whether sentences remain factually consistent after transformation, 2) extract a span in the source documents to support the consistency prediction, 3) extract a span in the summary sentence that is inconsistent if one exists. Transferring this model to summaries generated by several state-of-the art models reveals that this highly scalable approach substantially outperforms previous models, including those trained with strong supervision using standard datasets for natural language inference and fact checking. Additionally, human evaluation shows that the auxiliary span extraction tasks provide useful assistance in the process of verifying factual consistency.

Wojciech Kry\'sci\'nski, Bryan McCann, Caiming Xiong, Richard Socher• 2019

Related benchmarks

TaskDatasetResultRank
Factual Consistency EvaluationSummaC
CGS64.9
52
Factual Consistency EvaluationQAGS XSUM
Spearman Correlation28.8
39
Factual Consistency EvaluationQAGS CNNDM
Spearman Correlation40.3
38
Factual Consistency EvaluationTRUE benchmark
PAWS (AUC-ROC)53.4
37
Factual Consistency EvaluationSummEval
Spearman Correlation33.5
36
Factual Consistency EvaluationFRANK CNNDM
Spearman Correlation35.3
30
Factual Consistency EvaluationFRANK-XSum (FRK-X)
Spearman Correlation7.9
30
Factual Consistency EvaluationSamSum
Spearman Correlation-4.4
30
Factual Consistency EvaluationSE
Kendall's Tau32.2
22
Factual Consistency EvaluationQ-X
Kendall's tau28.8
22
Showing 10 of 39 rows

Other info

Follow for update