| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Question Answering | RQA | ASR88.2 | 130 | |
| Retrieval Question Answering | RQA | Accuracy76 | 72 | |
| Question Answering | RQA (test) | Accuracy79 | 60 | |
| Question Answering | RQA MC | RACC (Accuracy)77.2 | 58 | |
| Robust Question Answering | RQA Evolving evidence streams GPT-4o (test) | Accuracy72.68 | 24 | |
| RAG Poisoning Attack Mitigation | RQA | ASR (PIA)1 | 15 | |
| Question Answering | RQA poison @ Position 10, k=10 (test) | Robustness Accuracy76 | 15 | |
| Question Answering | RQA (poison @ Position 1, k=10) (test) | Robustness Accuracy0.7 | 15 | |
| RAG Robustness | RQA | Paradox RACC66.4 | 12 | |
| RAG Robustness | RQA-MC | Paradox RACC80 | 12 | |
| Short-answer QA | RQA | Accuracy71 | 8 | |
| Short-form open-domain QA | RQA | PIA Racc Score72 | 6 | |
| Multiple-choice QA | RQA-MC | Accuracy81 | 6 |