Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MultiNews

Benchmarks

Task NameDataset NameSOTA ResultTrend
SummarizationMultiNews LongBench (test)
ROUGE-L Score30.91
33
SummarizationMultiNews
F1 Score34.79
28
SummarizationMultiNews (test)
Comprehensiveness4.98
24
Long-context Text SummarizationMultiNews 128K context
ROUGE-L29.5
18
Multi-document SummarizationMultiNews
MAT1
15
Document SummarizationMultiNews
ASR87
14
Long-context summarizationMultiNews
MAT5.97
11
SummarizationMultiNews
ROUGE Score24.6
10
SummarizationMultiNews
ROUGE-1 Std Dev0.11
8
WatermarkingMultiNews
Generation Metric Score25.86
6
Text Summarization Hallucination EvaluationMultiNews
Accuracy19
6
Latency EvaluationMultiNews
End-to-End Latency3.86
6
Faithfulness discriminationMultiNews
AUC0.755
4
SummarizationMultiNews
Accuracy25.3
2
Showing 14 of 14 rows