| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Question Answering | 2WIKI | F191.79 | 152 | |
| Multi-hop Question Answering | 2Wiki | Exact Match74.9 | 152 | |
| Adversarial Attack | 2Wiki | ASR78.8 | 30 | |
| Multi-hop QA | 2Wiki | EM62 | 26 | |
| Multi-hop QA Retrieval | 2Wiki | Recall@290.1 | 23 | |
| Multi-hop Question Answering | 2Wiki (test) | F1 Score69.7 | 20 | |
| Retrieval | 2Wiki | Recall@587 | 19 | |
| Question Answering | 2Wiki 30K context | Accuracy73.7 | 19 | |
| Question Answering | 2Wiki 10K context | Accuracy72.2 | 19 | |
| Question Answering | 2Wiki 100K context | Accuracy65.5 | 18 | |
| Multi-Hop QA | 2Wiki | Accuracy70.5 | 17 | |
| Multi-hop Question Answering | 2Wiki | MBE59 | 17 | |
| Multi-Hop QA Verification | 2wiki | P@181.21 | 16 | |
| Multi-Hop Question Answering | 2wiki 2018 Wikipedia dump (dev) | Accuracy (%)43.6 | 14 | |
| Question Answering | 2WIKI (val) | EM27.2 | 14 | |
| Question Answering | 2WIKI (out-of-domain) | EM40 | 14 | |
| Question Answering | 2Wiki 500 samples (val) | EM39.6 | 14 | |
| Multi-Hop Question Answering | 2Wiki | CEM (%)71.7 | 12 | |
| Multi-hop Question Answering | 2Wiki | Token Count5,583 | 12 | |
| Question Answering | 2Wiki (test) | EM Accuracy61.8 | 12 | |
| Query Rewriting & QA | 2Wiki BM25 | F133.6 | 12 | |
| Open-domain Question Answering | 2WIKI | Accuracy48.9 | 11 | |
| Multi-Hop Question Answering | 2Wiki (out-of-domain) | Accuracy42 | 10 | |
| Expected Calibration Error | 2Wiki | ECE17.43 | 10 | |
| Multi-Hop QA | 2Wiki (test) | EM57.5 | 10 |