Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

E2E

Benchmarks

Task NameDataset NameSOTA ResultTrend
Natural Language GenerationE2E (test)
ROUGE-L89.94
79
Data-to-text generationE2E
ROUGE-L0.717
36
Data-to-Text GenerationE2E (test)
BLEU68.23
33
Dialogue GenerationE2E
BLEU64.81
10
Hallucination DetectionE2E (test)
F1-R90
10
Natural Language GenerationE2E en
ROUGE-245.8
9
Data-to-text generationCleaned E2E (test)
BLEU44.15
9
Data-to-Text GenerationE2E
ROUGE-237.6
8
Natural Language GenerationE2E
BLEU70.4
7
Vulnerability Attack PerformanceE2E Cross-domain (dev)
ASR (E-commerce) (CodeQL)0.8333
6
Vulnerability Attack PerformanceE2E (dev)
ASR (Semgrep) - E-commerce100
6
Span-level classificationE2E (test)
F1 Score (Span)56
6
Generation from meaning representationsE2E (test)
BLEU0.686
6
Semantic Content ControlE2E (test)
Control Score89.9
5
Error TracingE2E Synthetic Hallucination
auPR (England-China)94.14
5
Hallucination RetrievalE2E dataset
AuPR71.6
5
Controllable Text GenerationE2E Syntax Spans
Ctrl95.33
5
Controllable Text GenerationE2E Semantic Content
Ctrl85.06
5
Factual CorrectnessE2E Original (test)
Add0.14
5
Data-to-Text GenerationE2E Cleaned
Fluency5.46
5
Drought Impact ExtractionE2E dataset
Accuracy78.8
4
Natural Language GenerationE2E
GFLOPs per token4.65
4
Data-to-Text GenerationE2E clean MR
BLEU22.6
4
Data-to-text generationE2E+
BLEU0.6292
3
Data-To-TextGEM E2E en (test)
ROUGE-2-
0
Showing 25 of 26 rows