Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CNN Dailymail

Benchmarks

Task NameDataset NameSOTA ResultTrend
Abstractive SummarizationCNN/DailyMail full length F-1 (test)
ROUGE-141.69
48
Open ended generationCNN DailyMail
ROUGE-L24.3
40
Language GenerationCNN/DailyMail
Accuracy27.16
35
SummarizationCNN/DailyMail (test)
ROUGE-L48
33
Uncertainty QuantificationCNN/DailyMail
Hamming AUC0.745
28
SummarizationCNN/DailyMail
Hamming Score-0.276
28
Abstractive SummarizationCNN/DailyMail
ROUGE-144.51
25
SummarizationCNN/DailyMail
RougeL23.23
21
Abstractive SummarizationCNN/DailyMail Summarization
Hamming Distance1.597
20
RerankingCNN DailyMail
R-152.43
15
Text SummarizationCNN DailyMail
ROUGE-138.58
13
Length-Constrained Text GenerationCNN/DailyMail
Win Rate16.43
10
Text GenerationCNN/DailyMail (test)
LCTG Error Rate (E)3.18
10
Text SummarizationCNN/DailyMail (test)
ROUGE-133.23
9
Context AttributionCNN Dailymail (1000 examples)
Log Probability Drop1.48
9
News SummarizationCNN DailyMail
BLEU5.41
8
Extractive SummarizationCNN/DailyMail anonymized (test)
ROUGE-142.69
8
SummarizationCNN/DailyMail
Distribution Time (s)4.68
7
SummarizationCNN/DailyMail 50 document sample (sampled)
PPL0.3
7
SummarizationCNN/DailyMail (evaluation)
ROUGE-144.45
7
Text SummarizationCNN/DailyMail 100-example subset (ACU protocol) (test)
ACU0.4421
6
SummarizationCNN/DailyMail human evaluation (100 samples)
Relevance Score43
6
Hallucination DetectionCNN/DailyMail
AvgWD0.694
5
Abstractive SummarizationCNN/DailyMail
Baseline Throughput (samples/s)3.4
5
SummarizationCNN/DailyMail
R-1 Score37.44
4
Showing 25 of 30 rows