| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Question Answering | 2WIKI | EM86 | 241 | |
| Multi-hop Question Answering | 2Wiki | Exact Match74.9 | 215 | |
| Class-level Continual Learning | 2Wiki | Average Accuracy (AA)71.74 | 56 | |
| Question Answering | 2Wiki (test) | EM Accuracy61.8 | 49 | |
| Multi-Hop Question Answering | 2Wiki | Accuracy (2Wiki)43.6 | 44 | |
| Multi-hop QA | 2Wiki | EM62 | 42 | |
| Retrieval | 2Wiki | Recall@596.13 | 42 | |
| Multi-hop Question Answering | 2Wiki (test) | F1 Score69.7 | 34 | |
| Adversarial Attack | 2Wiki | ASR78.8 | 30 | |
| Question Answering | 2Wiki 100K context | Accuracy78.91 | 25 | |
| Multi-hop QA Retrieval | 2Wiki | Recall@290.1 | 23 | |
| Multi-hop Question Answering | 2Wiki 18 | Exact Match (EM)43.6 | 20 | |
| Multi-Hop Question Answering | 2Wiki | Token-Level F158.7 | 20 | |
| Question Answering | 2Wiki 30K context | Accuracy73.7 | 19 | |
| Question Answering | 2Wiki 10K context | Accuracy72.2 | 19 | |
| Multi-hop Question Answering | 2Wiki | pass@158 | 18 | |
| Multi-Hop Question Answering | 2Wiki | Exact Match (EM)46.9 | 18 | |
| Multi-Hop QA | 2Wiki | Accuracy70.5 | 17 | |
| Multi-hop Question Answering | 2Wiki | MBE59 | 17 | |
| Multi-Hop Search-augmented Question Answering | 2Wiki | Success Rate35.6 | 16 | |
| Multi-hop Question Answering | 2Wiki | EM48.89 | 16 | |
| Multi-Hop QA Verification | 2wiki | P@181.21 | 16 | |
| Monolingual Question Answering | 2WIKI | fEM61.13 | 14 | |
| Question Answering | 2Wiki (In-Distribution) | Accuracy77 | 14 | |
| Multi-Hop Question Answering | 2wiki 2018 Wikipedia dump (dev) | Accuracy (%)43.6 | 14 |