Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

$\texttt{COSMIC}$: Mutual Information for Task-Agnostic Summarization Evaluation

About

Assessing the quality of summarizers poses significant challenges. In response, we propose a novel task-oriented evaluation approach that assesses summarizers based on their capacity to produce summaries that are useful for downstream tasks, while preserving task outcomes. We theoretically establish a direct relationship between the resulting error probability of these tasks and the mutual information between source texts and generated summaries. We introduce $\texttt{COSMIC}$ as a practical implementation of this metric, demonstrating its strong correlation with human judgment-based metrics and its effectiveness in predicting downstream task performance. Comparative analyses against established metrics like $\texttt{BERTScore}$ and $\texttt{ROUGE}$ highlight the competitive performance of $\texttt{COSMIC}$.

Maxime Darrin, Philippe Formont, Jackie Chi Kit Cheung, Pablo Piantanida• 2024

Related benchmarks

TaskDatasetResultRank
Summarization EvaluationSummEval
Coherence23
41
Paraphrase embeddingParaphrase embedding
Correlation0.81
12
Emotion ClassificationEmotion
Correlation Coefficient0.56
12
Policy classificationPolicy classification
Correlation0.58
12
GPT detectionGPT detector
Correlation0.59
12
Showing 5 of 5 rows

Other info

Follow for update