| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Multi-hop Question Answering | 2WikiMHQA | F1 Score85.56 | 73 | |
| Multi-hop Reasoning | 2WikiMHQA | AUROC0.7002 | 26 | |
| Multi-hop Question Answering | 2WikiMHQA (test) | EM71.85 | 17 | |
| Multi-hop Question Answering | 2WikiMHQA in-distribution | Exact Match (EM)79.39 | 17 | |
| Question Answering | 2WikiMHQA Covered Questions | Accuracy66.9 | 8 | |
| Question Answering | 2WikiMHQA All Questions | Accuracy61 | 8 | |
| Multi-hop QA | 2WikiMHQA | Average Violation0.56 | 4 | |
| Question Answering | 2WikiMHQA KET-RAG | Accuracy64.8 | 2 | |
| Multi-hop Question Answering | 2WikiMHQA in-distribution v4 (test) | EM- | 0 |