Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

GovReport

Benchmarks

Task NameDataset NameSOTA ResultTrend
Language ModelingGovReport
Perplexity2.5
50
Document SummarizationGovReport (test)
ROUGE-175.56
50
Long-context language generationGovReport
Speedup (×)3.36
32
Long-context Input (Summarization)GovReport
Speedup3.09
31
SummarizationGovReport
ROUGE Score46
30
SummarizationGovReport (val)
ROUGE-L47.2
17
Document SummarizationGovReport
ROUGE-160.64
15
Document SummarizationGovReport
ASR Score89
14
SummarizationGovReport
Token MAE17.4
14
Context CompressionGovReport
ROUGE-161.39
13
SummarizationGovReport (test)
ROUGE-10.555
13
SummarizationGovReport 16K
95th Percentile Latency (ms)118
12
Long-Context SummarizationGovReport
ROUGE-1 Score22.03
12
Prompt Injection AttackGovReport
Attack Success Rate (ASR)0
11
Long-Context SummarizationGovReport (test)
ROUGE-116.11
10
Plain SummarizationGovReport
ROUGE-146.93
9
Document SummarizationGovReport
G-mean18.43
9
Long-context Question AnsweringGovReport
C Score68.4
9
Long-context answering with citationsGovReport
Citation Recall82.8
9
SummarizationGovReport
BERTScore0.92
7
Extractive SummarizationGovReport (GR)
Accuracy75.3
5
Text SimplificationGovReport 500 samples (human evaluation)
Coherence4.27
4
Prompt Injection AttackGovReport
ASR100
4
Summarization Evaluation (Human Correlation)GovReport (test)
Relevance61
4
SummarizationGovReport SCROLLS (test)
ROUGE-148.18
3
Showing 25 of 30 rows