Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ELI5

Benchmarks

Task NameDataset NameSOTA ResultTrend
Long Form Question AnsweringELI5 (test)
ROUGE-L27.13
54
Long Form Question AnsweringELI5
Performance Score78.4
32
Watermarkingeli5-category (test)
PPL1.313
28
Dialectal alignment estimationELI5 informal
Percentage of AmE Generations0.77
20
Attributed Text GenerationELI5
Claim Correctness Score25.9
19
Text generation quality and watermark detectabilityELI5
AUC100
16
AttributionELI5
Precision55.8
15
Word-level length-controlled question answeringELI5 (test)
MAE14.1
14
Question AnsweringELI5 Wiki-answerable
ROUGE-L Score26.6
14
Sentence-level length-controlled question answeringELI5 (test)
MAE0.08
12
Token-level length-controlled question answeringELI5 (test)
MAE10.33
12
Long-Form Question AnsweringELI5 (val)
F131.5
11
Sentence-level attributionELI5 (test)
Citation Recall81.9
10
Grounded GenerationELI5 (test)
Fluency (mauve)47.43
10
Knowledge-intensive generationELI5 (dev)
ROUGE-L Score26.6
9
Long-form Question AnsweringELI5 KILT (test)
F125.4
8
Long-form Question Answering with CitationsELI5
Correctness0.186
8
RetrievalELI5 KILT (test)
Retrieval Precision11
8
Open-domain QAELI5
R-L20.75
8
Fine-grained passage-level retrievalELI5
Answer in Context16.85
7
Instruction-followingELI5 prompts Gemma-7B-it 200 tokens (test)
Perplexity1.7784
6
Question AnsweringELI5
B-1 Score27.9
6
Attributed Question AnsweringELI5 (test)
Rouge-L21.3
5
Long-form Question Answering refinementELI5 (test)
Error Rate0.0381
5
Long-form Question AnsweringELI5 standard original
RL Score26.9
5
Showing 25 of 32 rows