Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

GovReport

Benchmarks

Task NameDataset NameSOTA ResultTrend
Language ModelingGovReport
Perplexity2.5
50
Document SummarizationGovReport (test)
ROUGE-175.56
50
Long-context Input (Summarization)GovReport
Time Per Token (TPT)6.62
20
SummarizationGovReport (val)
ROUGE-L47.2
17
Document SummarizationGovReport
ROUGE-160.64
15
SummarizationGovReport
Token MAE17.4
14
Long-Context SummarizationGovReport (test)
ROUGE-116.11
10
Long-Context SummarizationGovReport
ROUGE-1 Score17.23
10
Long-context Question AnsweringGovReport
C Score68.4
9
Long-context answering with citationsGovReport
Citation Recall82.8
9
SummarizationGovReport
BERTScore0.92
7
SummarizationGovReport (test)
ROUGE-10.555
4
Summarization Evaluation (Human Correlation)GovReport (test)
Relevance61
4
Pairwise Preference EvaluationGovReport
Pairwise Win Rate52.9
3
SummarizationGovReport LongBench (test)
ROUGE-1 Score30.91
3
Model RankingGovReport ROUGE-L (test)
Kendall Tau (τ)0.823
2
Showing 16 of 16 rows