Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TL;DR

Benchmarks

Task NameDataset NameSOTA ResultTrend
SummarizationTL;DR (test)
Win Rate82.5
49
SummarizationTL;DR
Winrate91.8
42
Preference AlignmentTL;DR (test)
Win Rate68.8
36
SummarizationTL;DR (distillation set)
Word Count27.24
16
Reward ModelingTL;DR Seen (n=100)
Accuracy62.3
14
SummarizationTL;DR
Completeness43
11
Reward ModelingTL;DR Overall n=150
Accuracy62.9
7
Reward ModelingTL;DR Unseen (n=150)
Accuracy62.4
7
Reward ModelingTL;DR n=150 Seen
Accuracy63.3
7
Reward ModelingTL;DR n=100 Unseen
Accuracy61.5
7
SummarizationTL;DR
Win Rate92.8
6
Summarization (Groundedness)TL;DR
Kendall's Tau0.46
5
Summarization (Completeness)TL;DR
Kendall's Tau0.44
5
Preference AlignmentTL;DR
GRA (%)64.4
4
SummarizationTL;DR
Winrate50.5
4
Summarization Preference EvaluationTL;DR (val)
Metric-
0
Text SummarizationTL;DR (test)
Metric-
0
Showing 17 of 17 rows