| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MultiHopRAG (test val) | SetR-CoT & IRI | Accuracy47.14 | 20 | 3d ago | |
| MuSiQue (test val) | SetR-CoT & IRI | EM10.79 | 20 | 3d ago | |
| 2WikiMultiHopQA (test val) | SetR-CoT & IRI | EM35.44 | 20 | 3d ago | |
| HotpotQA (test val) | SetR-Selection only | EM36.68 | 20 | 3d ago | |
| HotpotQA official Wikipedia paragraphs | HopRetriever | EM67.1 | 9 | 3d ago | |
| MMCoQA (test) | LILaC (w/ MM-Embed) | EM36.31 | 7 | 3d ago | |
| MultimodalQA (test) | LILaC (w/ MM-Embed) | EM44.57 | 7 | 3d ago | |
| InfoVQA (test) | LILaC (w/ MM-Embed) | EM60.91 | 7 | 3d ago | |
| MP-DocVQA (test) | LILaC (w/ MM-Embed) | EM65.48 | 7 | 3d ago | |
| OTT-QA | ARM | Accuracy0.317 | 6 | 3d ago | |
| Bird | ARM | Accuracy20.6 | 6 | 3d ago | |
| InfoSeek 25-sample perturbed subset | Rotation52 | 4 | 3d ago | ||
| HotpotQA ANCE | ACQO | MAP@1049.6 | 3 | 3d ago | |
| 3+ hop challenge questions (official Wikipedia paragraphs) | IRRR | EM32.5 | 3 | 3d ago |