Evaluating the Factual Consistency of Abstractive Text Summarization

About

Currently used metrics for assessing summarization algorithms do not account for whether summaries are factually consistent with source documents. We propose a weakly-supervised, model-based approach for verifying factual consistency and identifying conflicts between source documents and a generated summary. Training data is generated by applying a series of rule-based transformations to the sentences of source documents. The factual consistency model is then trained jointly for three tasks: 1) identify whether sentences remain factually consistent after transformation, 2) extract a span in the source documents to support the consistency prediction, 3) extract a span in the summary sentence that is inconsistent if one exists. Transferring this model to summaries generated by several state-of-the art models reveals that this highly scalable approach substantially outperforms previous models, including those trained with strong supervision using standard datasets for natural language inference and fact checking. Additionally, human evaluation shows that the auxiliary span extraction tasks provide useful assistance in the process of verifying factual consistency.

Wojciech Kry\'sci\'nski, Bryan McCann, Caiming Xiong, Richard Socher• 2019

Related benchmarks

Task	Dataset	Result
Factual Consistency Evaluation	SummaC	CGS64.9	52
Factual Consistency Evaluation	QAGS XSUM	Spearman Correlation28.8	39
Factual Consistency Evaluation	QAGS CNNDM	Spearman Correlation40.3	38
Factual Consistency Evaluation	TRUE benchmark	PAWS (AUC-ROC)53.4	37
Factual Consistency Evaluation	SummEval	Spearman Correlation33.5	36
Factual Consistency Evaluation	QAGS-XSum (test)	Pearson Correlation Coefficient2.88	35
Factual Consistency Evaluation	FRANK CNNDM	Spearman Correlation35.3	30
Factual Consistency Evaluation	FRANK-XSum (FRK-X)	Spearman Correlation7.9	30
Factual Consistency Evaluation	SamSum	Spearman Correlation-4.4	30
Factual Consistency Evaluation	SE	Kendall's Tau32.2	22

Showing 10 of 41 rows

Other info

Follow for update

@wizwand_team Discord