Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ELI5

Benchmarks

Task NameDataset NameSOTA ResultTrend
Long Form Question AnsweringELI5
ROUGE-L31.15
57
Long Form Question AnsweringELI5 (test)
ROUGE-L27.13
54
Watermarkingeli5-category (test)
PPL1.313
28
Dialectal alignment estimationELI5 informal
Percentage of AmE Generations0.77
20
Attributed Text GenerationELI5
Claim Correctness Score25.9
19
Text generation quality and watermark detectabilityELI5
AUC100
16
AttributionELI5
Precision55.8
15
Word-level length-controlled question answeringELI5 (test)
MAE14.1
14
Question AnsweringELI5 Wiki-answerable
ROUGE-L Score26.6
14
Sentence-level length-controlled question answeringELI5 (test)
MAE0.08
12
Token-level length-controlled question answeringELI5 (test)
MAE10.33
12
Long-Form Question AnsweringELI5 (val)
F131.5
11
Sentence-level attributionELI5 (test)
Citation Recall81.9
10
Grounded GenerationELI5 (test)
Fluency (mauve)47.43
10
Knowledge-intensive generationELI5 (dev)
ROUGE-L Score26.6
9
Distortion-Free Watermark EvaluationELI5 16-bit bitstrings LLAMA3.1-8B
Message Accuracy77.8
8
Long-form Question AnsweringELI5 KILT (test)
F125.4
8
Long-form Question Answering with CitationsELI5
Correctness0.186
8
RetrievalELI5 KILT (test)
Retrieval Precision11
8
Open-domain QAELI5
R-L20.75
8
Paraphrasing RobustnessELI5 prompts 32-bit payload Llama3.1-8B (test)
Bit Accuracy50.9
7
Synonym Substitution RobustnessELI5 prompts 32-bit payload Llama3.1-8B (test)
Bit Accuracy (10% Substitution)70.7
7
Word Deletion RobustnessELI5 prompts 32-bit payload Llama3.1-8B (test)
Bit Accuracy (10% Deletion)70.7
7
Fine-grained passage-level retrievalELI5
Answer in Context16.85
7
Distortion-Free Watermark EvaluationELI5 64-bit bitstrings LLAMA3.1-8B
Message Accuracy0
6
Showing 25 of 43 rows