Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

ELI5

Benchmarks

Task NameDataset NameSOTA ResultTrend
Long Form Question AnsweringELI5 (test)
ROUGE-L27.13
54
Watermarkingeli5-category (test)
PPL1.313
28
Long Form Question AnsweringELI5
ROUGE-L25.57
27
Attributed Text GenerationELI5
Claim Correctness Score25.9
19
Text generation quality and watermark detectabilityELI5
AUC100
16
Word-level length-controlled question answeringELI5 (test)
MAE14.1
14
Question AnsweringELI5 Wiki-answerable
ROUGE-L Score26.6
14
Sentence-level length-controlled question answeringELI5 (test)
MAE0.08
12
Token-level length-controlled question answeringELI5 (test)
MAE10.33
12
Long-Form Question AnsweringELI5 (val)
F131.5
11
Sentence-level attributionELI5 (test)
Citation Recall81.9
10
Grounded GenerationELI5 (test)
Fluency (mauve)47.43
10
Knowledge-intensive generationELI5 (dev)
ROUGE-L Score26.6
9
Long-form Question AnsweringELI5 KILT (test)
F125.4
8
Long-form Question Answering with CitationsELI5
Correctness0.186
8
RetrievalELI5 KILT (test)
Retrieval Precision11
8
Open-domain QAELI5
R-L20.75
8
Fine-grained passage-level retrievalELI5
Answer in Context16.85
7
Instruction-followingELI5 prompts Gemma-7B-it 200 tokens (test)
Perplexity1.7784
6
Question AnsweringELI5
B-1 Score27.9
6
Attributed Question AnsweringELI5 (test)
Rouge-L21.3
5
Long-form Question Answering refinementELI5 (test)
Error Rate0.0381
5
Long-form Question AnsweringELI5 standard original
RL Score26.9
5
Natural Language GenerationELI5
Relevance (Mean)5.14
5
Knowledge-grounded GenerationELI5 ALCE (test)
Correctness10.5
4
Showing 25 of 30 rows