Our new X account is live! Follow @wizwand_team for updates
Search any
task
Feedback
Search any
task
SOTA Multi-hop Question Answering benchmarks and papers with code | Wizwand
Our new X account is live! Follow @wizwand_team for updates
Home
/
Tasks
Multi-hop Question Answering
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
2WikiMultiHopQA
CoopRAG
EM
71.7
278
2d ago
HotpotQA
ARISE
F1 Score
75.39
221
2d ago
HotpotQA (test)
Gemini-2.5-Flash
F1
80.79
198
2d ago
2WikiMQA
CIRAG
F1 Score
76.4
154
3d ago
2WikiMultiHopQA (test)
QuCo-RAG
EM
64.6
143
3d ago
MuSiQue (test)
MultiCube-RAG
F1
50.9
111
3d ago
Musique
ARISE
EM
40.5
106
2d ago
Bamboogle
Agentic-R
Exact Match
48
97
3d ago
HotpotQA
CIRAG
F1
74.9
79
3d ago
2WikiMultiHopQA Out-Of-Distribution (OOD)
QwenLong-L1-32B
Accuracy
74.2
72
3d ago
LoCoMo
Membox
F1
48.35
67
2d ago
Multi-hop RAG
IndexLM-4B
F1
86.07
65
3d ago
HotpotQA fullwiki setting (test)
IDRQA
Answer F1
75.9
64
3d ago
HotpotQA
Agentic-R
Exact Match (EM)
47.68
56
3d ago
2WikiMHQA
CRAFT7B
F1 Score
85.56
55
2d ago
Bamboogle
TC+FM*
Accuracy
75.2
52
3d ago
HotpotQA
ACPS
F1
75.97
48
3d ago
Bamboogle (test)
Workflow-R1-Search
EM
57.6
46
3d ago
Multi-Hop QA (HotpotQA, 2Wiki, Musique, Bamboogle) (test)
JADE
HotpotQA Score
57.02
44
3d ago
HotpotQA (dev)
SEQGRAPH
Answer F1
81.62
43
3d ago
2Wiki
ChainRAG (CxtInt)
F1 Score
70.58
41
3d ago
HotpotQA
Stable-RAG
SubEM
39.12
40
3d ago
Multi-Hop QA (HotpotQA, 2Wiki, Musique, Bamboogle)
Search-R1-GRPO + LLDS
HotpotQA Score
49.2
39
3d ago
HotpotQA fullwiki setting (dev)
PATHFID+
Answer F1
81.5
38
3d ago
Bamboogle
GraphAnchor
EM
32.23
37
3d ago
Showing 25 of 189 rows
25 / page
50 / page
100 / page
1
2
3
...
8
Search any
task
Search any
task
Terms of Service
FAQs