Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CNN Dailymail

Benchmarks

Task NameDataset NameSOTA ResultTrend
Best feasible prompt identificationCNN/DailyMail (test)
Average Soft Constrained Reward0.172
72
Abstractive SummarizationCNN/DailyMail full length F-1 (test)
ROUGE-141.69
48
Open ended generationCNN DailyMail
ROUGE-L24.3
40
Pareto prompt set identificationCNN/DailyMail
Hypervolume (HV)18.03
36
Language GenerationCNN/DailyMail
Accuracy27.16
35
SummarizationCNN/DailyMail (test)
ROUGE-L48
33
Uncertainty QuantificationCNN/DailyMail
Hamming AUC0.745
28
SummarizationCNN/DailyMail
Hamming Score-0.276
28
Abstractive SummarizationCNN/DailyMail
ROUGE-144.51
25
SummarizationCNN DailyMail
PRR0.45
22
SummarizationCNN/DailyMail
RougeL23.23
21
Abstractive SummarizationCNN/DailyMail Summarization
Hamming Distance1.597
20
Text SummarizationCNN/DailyMail
BA97.34
16
RerankingCNN DailyMail
R-152.43
15
Context CompressionCNN/DailyMail
ROUGE-144.89
13
Text SummarizationCNN DailyMail
ROUGE-138.58
13
Long-context ModelingCNN/DailyMail
Speedup3.85
12
Binary Classification (Assistive vs Creative)CNN DailyMail
AUC99
12
Binary Classification (Human vs Creative)CNN/DailyMail
AUC99
12
Binary Classification (Human vs Assistive)CNN/DailyMail
AUC0.99
12
Attribution Quality EvaluationCNN DailyMail
Log-Prob Drop1.371
12
Short-GenerationCNN/DailyMail
ROUGE-121.15
10
Length-Constrained Text GenerationCNN/DailyMail
Win Rate16.43
10
Text GenerationCNN/DailyMail (test)
LCTG Error Rate (E)3.18
10
Text SummarizationCNN/DailyMail (test)
ROUGE-133.23
9
Showing 25 of 52 rows