| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MultiHop-RAG (test) | ScalDPP | NDCG@1063.26 | 24 | 12d ago | |
| HotpotQA | G-reasoner | Recall@285.9 | 23 | 1mo ago | |
| Average (MuSiQue, 2Wiki, HotpotQA) | GFM-RAG | R@272.5 | 10 | 1mo ago | |
| Multi-hop 4 datasets aggregate (test) | Ours | NDCG@1058.5 | 8 | 1mo ago | |
| HotPotQA | FrugalRAG-7B | Recall70.4 | 6 | 1mo ago | |
| HotpotQA Fullwiki (test) | MDR | Retrieval EM85.6 | 4 | 1mo ago | |
| 2Wiki (test) | PHASEGRAPH | LASTOP@553.6 | 3 | 18d ago | |
| MuSiQue (test) | LASTHOP@576.8 | 3 | 18d ago |