$\texttt{COSMIC}$: Mutual Information for Task-Agnostic Summarization Evaluation

About

Assessing the quality of summarizers poses significant challenges. In response, we propose a novel task-oriented evaluation approach that assesses summarizers based on their capacity to produce summaries that are useful for downstream tasks, while preserving task outcomes. We theoretically establish a direct relationship between the resulting error probability of these tasks and the mutual information between source texts and generated summaries. We introduce $\texttt{COSMIC}$ as a practical implementation of this metric, demonstrating its strong correlation with human judgment-based metrics and its effectiveness in predicting downstream task performance. Comparative analyses against established metrics like $\texttt{BERTScore}$ and $\texttt{ROUGE}$ highlight the competitive performance of $\texttt{COSMIC}$.

Maxime Darrin, Philippe Formont, Jackie Chi Kit Cheung, Pablo Piantanida• 2024

Related benchmarks

Task	Dataset	Result
Summarization Evaluation	SummEval	Coherence23	41
Paraphrase embedding	Paraphrase embedding	Correlation0.81	12
Emotion Classification	Emotion	Correlation Coefficient0.56	12
Policy classification	Policy classification	Correlation0.58	12
GPT detection	GPT detector	Correlation0.59	12

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord