| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Multi-hop Question Answering | 2WikiMultiHopQA | EM82.1 | 387 | |
| Multi-hop Question Answering | 2WikiMultiHopQA (test) | EM73.9 | 195 | |
| Question Answering | 2WikiMultihopQA | EM47.7 | 107 | |
| Question Answering | 2WikiMultihopQA (test) | F178.9 | 81 | |
| Multi-hop Question Answering | 2WikiMultiHopQA Out-Of-Distribution (OOD) | Accuracy74.2 | 72 | |
| Long-context Question Answering | 2WikiMultiHopQA (Out-Of-Distribution) | Accuracy63.9 | 54 | |
| Multi-hop QA Retrieval | 2WikiMultihopQA (test) | R@597.2 | 33 | |
| Question Answering | 2WikiMultihopQA LongBench | F1 Score53.61 | 28 | |
| Question Answering | 2WikiMultihopQA | Accuracy62.5 | 25 | |
| Multi-hop Question Answering | 2WikiMultiHopQA (val) | ASR95.4 | 24 | |
| Multi-hop Question Answering | 2WikiMultiHopQA N=200 | Judge EM77 | 24 | |
| Knowledge composition selection | 2WikiMultihopQA | Precision @ K=2100 | 23 | |
| Latent multi-hop reasoning | 2WikiMultiHopQA | Precision96.86 | 22 | |
| Multi-hop Question Answering | 2WikiMultiHopQA Full | Accuracy (C)87.5 | 22 | |
| Retrieval | 2WikiMultiHopQA v1 (test) | R@2E85 | 21 | |
| Question Answering | 2WikiMultihopQA | LLM-Acc89.7 | 20 | |
| End-to-end Question Answering | 2WikiMultiHopQA (test val) | EM35.44 | 20 | |
| Knowledge-Intensive Reasoning | 2wikiMultiHopQA | F1 Score76.1 | 18 | |
| Knowledge-Intensive Reasoning | 2WikiMultiHopQA | Accuracy48.8 | 18 | |
| Multi-hop Question Answering | 2WikiMultiHopQA (dev test) | F1 Score81.5 | 17 | |
| Multi-hop Question Answering | 2WikiMultiHopQA (2WikiMQA) (official evaluation) | Exact Match (EM)31.8 | 17 | |
| Multi-Hop Question Answering | 2WikiMultiHopQA out-of-domain (val test) | Exact Match (EM)51.7 | 15 | |
| Agentic Search | 2WikiMultiHopQA | String-F169.9 | 14 | |
| Multi-hop Question Answering | 2WikiMultiHopQA in-domain (test) | Accuracy (Response)69.8 | 14 | |
| Question Answering | 2WikiMultiHopQA December 2018 Wikipedia dump (test) | EM28.6 | 14 |