Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

GovReport

Benchmarks

Task NameDataset NameSOTA ResultTrend
Language ModelingGovReport
Perplexity2.5
50
Document SummarizationGovReport (test)
ROUGE-175.56
50
Long-context language generationGovReport
Average Acceptance Length (τ)3.59
25
Long-context Input (Summarization)GovReport
Time Per Token (TPT)6.62
20
SummarizationGovReport (val)
ROUGE-L47.2
17
Document SummarizationGovReport
ROUGE-160.64
15
Document SummarizationGovReport
ASR Score89
14
SummarizationGovReport
Token MAE17.4
14
Long-Context SummarizationGovReport
ROUGE-1 Score22.03
12
Long-Context SummarizationGovReport (test)
ROUGE-116.11
10
Document SummarizationGovReport
G-mean18.43
9
Long-context Question AnsweringGovReport
C Score68.4
9
Long-context answering with citationsGovReport
Citation Recall82.8
9
SummarizationGovReport
BERTScore0.92
7
Extractive SummarizationGovReport (GR)
Accuracy75.3
5
SummarizationGovReport (test)
ROUGE-10.555
4
Summarization Evaluation (Human Correlation)GovReport (test)
Relevance61
4
Long-Context SummarizationGovReport
ROUGE-L22.03
3
Pairwise Preference EvaluationGovReport
Pairwise Win Rate52.9
3
SummarizationGovReport LongBench (test)
ROUGE-1 Score30.91
3
SummarizationGovReport
ROUGE Score30.9
2
Model RankingGovReport ROUGE-L (test)
Kendall Tau (τ)0.823
2
Showing 22 of 22 rows