| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| HotPotQA | CoopRAG (GPT-4o-mini) | Exact Match65.6 | 143 | 9d ago | |
| MuSiQue | SLEA-RL | EM77.2 | 95 | 9d ago | |
| 2WikiMultihopQA | SDGA-Phase | Exact Match (EM)56.4 | 67 | 21d ago | |
| Bamboogle | SDGA-Phase | Exact Match (EM)57.8 | 46 | 21d ago | |
| 2Wiki | HELP | EM62 | 42 | 9d ago | |
| Bamboogle | Search-o1 | EM56 | 27 | 2mo ago | |
| Bamboogle | IGPO | Accuracy (%)74.9 | 25 | 2mo ago | |
| StrategyQA (SQA) | SearChain | Cover-EM76.95 | 20 | 9d ago | |
| 2Wiki | SLEA-RL | Accuracy70.5 | 17 | 2mo ago | |
| FictionalHot | ReSeek | Exact Match (EM)6.1 | 16 | 23d ago | |
| Musique | ReSeek | Exact Match (EM)18.5 | 16 | 23d ago | |
| DROP (test) | DenoiseFlow | F1 Score87.9 | 14 | 3mo ago | |
| HotpotQA (test val) | EXTAGENTS | F1 Score59.7 | 11 | 1mo ago | |
| 2Wiki (test) | IGMiRAG | EM57.5 | 10 | 14d ago | |
| CompWebQ | CBR | Accuracy70.4 | 9 | 3mo ago | |
| WebQSP | GPT-4+RFKG-CoT | Accuracy91.5 | 9 | 3mo ago | |
| Zh.QA | EXTAGENTS | F1 Score48.2 | 8 | 1mo ago | |
| En.QA | EXTAGENTS | F138.2 | 8 | 1mo ago | |
| Bamboogle | SEARL | Pass@130.4 | 6 | 1mo ago | |
| 2wiki | SEARL | pass@136 | 6 | 1mo ago | |
| HotpotQA | DAPO | pass@133.5 | 6 | 1mo ago | |
| Assembly Knowledge Graph QA Multi-hop (test) | AssemMate | nLCS66.7 | 5 | 3mo ago | |
| Bamboogle | ReSearch | F1 Score53.61 | 5 | 3mo ago | |
| Bamboogle | Step-Level | F1 Score72 | 4 | 9d ago | |
| MuSiQue | SAVER | Avg Violation0.83 | 4 | 1mo ago |