| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Long-form Question Answering with Citations | ASQA | EM45.01 | 37 | |
| Question Answering | ASQA (test) | Correctness EM Recall40.05 | 29 | |
| Question Answering | ASQA | StrEM44.73 | 27 | |
| Attributed Text Generation | ASQA | Correctness (EM Rec.)50.1 | 19 | |
| Long-form Question Answering | ASQA | str-em51.3 | 15 | |
| Question Answering | ASQA (in-domain) | EM47.21 | 12 | |
| Completeness | ASQA | Kendall's Tau0.54 | 11 | |
| Retrieval-Augmented Generation | ASQA | str-EM42.44 | 11 | |
| Sentence-level attribution | ASQA (test) | Citation Recall87.2 | 10 | |
| RAG-Completeness | ASQA (test) | Kendall's Tau0.54 | 6 | |
| Long-form Question Answering refinement | ASQA (test) | Error Rate (%)16.63 | 5 | |
| Open-Domain Question Answering | ASQA (dev) | STR-EM37.22 | 4 | |
| Knowledge-grounded Generation | ASQA ALCE (test) | Correctness31.8 | 4 | |
| Attributed Question Answering | ASQA ALCE (dev) | FSupp88.58 | 3 |