| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| HotPotQA | CoopRAG (GPT-4o-mini) | Exact Match65.6 | 76 | 29d ago | |
| MuSiQue | SLEA-RL | EM77.2 | 65 | 29d ago | |
| Bamboogle | Search-o1 | EM56 | 27 | 1mo ago | |
| 2Wiki | HELP | EM62 | 26 | 1mo ago | |
| Bamboogle | IGPO | Accuracy (%)74.9 | 25 | 15d ago | |
| 2WikiMultihopQA | TaSR-RAG | F1 Score66.2 | 23 | 1mo ago | |
| StrategyQA (SQA) | SearChain | Cover-EM76.95 | 20 | 1mo ago | |
| 2Wiki | SLEA-RL | Accuracy70.5 | 17 | 29d ago | |
| DROP (test) | DenoiseFlow | F1 Score87.9 | 14 | 1mo ago | |
| 2Wiki (test) | IGMiRAG | EM57.5 | 10 | 1mo ago | |
| CompWebQ | CBR | Accuracy70.4 | 9 | 1mo ago | |
| WebQSP | GPT-4+RFKG-CoT | Accuracy91.5 | 9 | 1mo ago | |
| Bamboogle | SEARL | Pass@130.4 | 6 | 8d ago | |
| 2wiki | SEARL | pass@136 | 6 | 8d ago | |
| HotpotQA | DAPO | pass@133.5 | 6 | 8d ago | |
| Assembly Knowledge Graph QA Multi-hop (test) | AssemMate | nLCS66.7 | 5 | 1mo ago | |
| Bamboogle | ReSearch | F1 Score53.61 | 5 | 1mo ago | |
| MuSiQue | SAVER | Avg Violation0.83 | 4 | 8d ago | |
| 2WikiMHQA | SAVER | Average Violation0.56 | 4 | 8d ago | |
| 2WikiMultihopQA | SwiR | Pass@1 Accuracy81.5 | 4 | 1mo ago | |
| LV-Eval | HELP | EM10.5 | 3 | 1mo ago |