| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Long Form Question Answering | ELI5 | ROUGE-L31.15 | 57 | |
| Long Form Question Answering | ELI5 (test) | ROUGE-L27.13 | 54 | |
| Watermarking | eli5-category (test) | PPL1.313 | 28 | |
| Dialectal alignment estimation | ELI5 informal | Percentage of AmE Generations0.77 | 20 | |
| Attributed Text Generation | ELI5 | Claim Correctness Score25.9 | 19 | |
| Text generation quality and watermark detectability | ELI5 | AUC100 | 16 | |
| Attribution | ELI5 | Precision55.8 | 15 | |
| Word-level length-controlled question answering | ELI5 (test) | MAE14.1 | 14 | |
| Question Answering | ELI5 Wiki-answerable | ROUGE-L Score26.6 | 14 | |
| Sentence-level length-controlled question answering | ELI5 (test) | MAE0.08 | 12 | |
| Token-level length-controlled question answering | ELI5 (test) | MAE10.33 | 12 | |
| Long-Form Question Answering | ELI5 (val) | F131.5 | 11 | |
| Sentence-level attribution | ELI5 (test) | Citation Recall81.9 | 10 | |
| Grounded Generation | ELI5 (test) | Fluency (mauve)47.43 | 10 | |
| Knowledge-intensive generation | ELI5 (dev) | ROUGE-L Score26.6 | 9 | |
| Distortion-Free Watermark Evaluation | ELI5 16-bit bitstrings LLAMA3.1-8B | Message Accuracy77.8 | 8 | |
| Long-form Question Answering | ELI5 KILT (test) | F125.4 | 8 | |
| Long-form Question Answering with Citations | ELI5 | Correctness0.186 | 8 | |
| Retrieval | ELI5 KILT (test) | Retrieval Precision11 | 8 | |
| Open-domain QA | ELI5 | R-L20.75 | 8 | |
| Paraphrasing Robustness | ELI5 prompts 32-bit payload Llama3.1-8B (test) | Bit Accuracy50.9 | 7 | |
| Synonym Substitution Robustness | ELI5 prompts 32-bit payload Llama3.1-8B (test) | Bit Accuracy (10% Substitution)70.7 | 7 | |
| Word Deletion Robustness | ELI5 prompts 32-bit payload Llama3.1-8B (test) | Bit Accuracy (10% Deletion)70.7 | 7 | |
| Fine-grained passage-level retrieval | ELI5 | Answer in Context16.85 | 7 | |
| Distortion-Free Watermark Evaluation | ELI5 64-bit bitstrings LLAMA3.1-8B | Message Accuracy0 | 6 |