Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Multi-hop Question Answering benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Multi-hop Question Answering
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
2WikiMultiHopQA
SynPlanResearch-R1
EM
82.1
559
13d ago
HotpotQA (test)
Gemini-2.5-Flash
F1
80.79
311
17h ago
HotpotQA
SAFE
F1 Score
77.4
294
1mo ago
2WikiMultiHopQA (test)
CJ
EM
73.9
226
1d ago
2Wiki
G-reasoner
Exact Match
74.9
215
25d ago
Musique
INSES + Router
EM
46
209
22h ago
2WikiMQA
CIRAG
F1 Score
76.4
161
8d ago
HotpotQA
Tree-GRPO
Exact Match (EM)
50.2
150
13d ago
Bamboogle
EvalAct
Exact Match
56
128
1mo ago
MuSiQue (test)
SEARCH-R
F1
55.68
128
17h ago
LoCoMo
Membox
F1
48.35
125
7d ago
Bamboogle (test)
Workflow-R1-Search
EM
57.6
98
1mo ago
HotpotQA
CIRAG
F1
74.9
79
3mo ago
Multi-hop RAG
IndexLM-4B
F1
86.07
77
1mo ago
2WikiMHQA
CRAFT7B
F1 Score
85.56
73
1mo ago
HotpotQA
INSES + Router
LLM Judge Score
80
72
22h ago
2WikiMultiHopQA Out-Of-Distribution (OOD)
QwenLong-L1-32B
Accuracy
74.2
72
3mo ago
2WikiQA (test)
MultiCube-RAG
F1
74.3
71
21d ago
HotpotQA
SD-Search-Instruct
Exact Match (EM)
47.1
66
14d ago
HotpotQA fullwiki setting (test)
IDRQA
Answer F1
75.9
64
3mo ago
Bamboogle
TC+FM*
Accuracy
75.2
62
2mo ago
Multi-Hop QA (HotpotQA, 2Wiki, Musique, Bamboogle) (test)
JADE
HotpotQA Score
57.02
60
6d ago
MuSiQue
EvalAct
Exact Match (EM)
25.3
58
1mo ago
Bamboogle
SD-Search-Instruct
Exact Match (EM)
54.4
55
13d ago
HotPotQA
Teacher (OPT13B)
CoT Match Rate
100
54
2mo ago
Showing 25 of 437 rows
25 / page
50 / page
100 / page
1
2
3
...
18
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs