SummaC: Re-Visiting NLI-based Models for Inconsistency Detection in Summarization

About

In the summarization domain, a key requirement for summaries is to be factually consistent with the input document. Previous work has found that natural language inference (NLI) models do not perform competitively when applied to inconsistency detection. In this work, we revisit the use of NLI for inconsistency detection, finding that past work suffered from a mismatch in input granularity between NLI datasets (sentence-level), and inconsistency detection (document level). We provide a highly effective and light-weight method called SummaCConv that enables NLI models to be successfully used for this task by segmenting documents into sentence units and aggregating scores between pairs of sentences. On our newly introduced benchmark called SummaC (Summary Consistency) consisting of six large inconsistency detection datasets, SummaCConv obtains state-of-the-art results with a balanced accuracy of 74.4%, a 5% point improvement compared to prior work. We make the models and datasets available: https://github.com/tingofurro/summac

Philippe Laban, Tobias Schnabel, Paul N. Bennett, Marti A. Hearst• 2021

Related benchmarks

Task	Dataset	Result
Factual Consistency Evaluation	SummaC	CGS73.6	52
Factual Consistency Evaluation	QAGS XSUM	Spearman Correlation45	39
Factual Consistency Evaluation	QAGS CNNDM	Spearman Correlation58.4	38
Factual Consistency Evaluation	TRUE benchmark	PAWS (AUC-ROC)89	37
Factual Consistency Evaluation	SummEval	Spearman Correlation41.4	36
Opinion Summarization Metric Evaluation	OPINSUMMEVAL	Aspect Relevance30	32
Factual Consistency Evaluation	FRANK CNNDM	Spearman Correlation52.4	30
Factual Consistency Evaluation	SamSum	Spearman Correlation16.7	30
Factual Consistency Evaluation	FRANK-XSum (FRK-X)	Spearman Correlation12.8	30
Factual Consistency Evaluation	FRANK CNNDM (test)	PCC58.7	22

Showing 10 of 41 rows

Other info

Follow for update

@wizwand_team Discord