Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Multi-hop Reasoning benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Multi-hop Reasoning
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
MuSiQue
GPT-4o-0806
EM
53
41
1mo ago
StrategyQA
OpenMath2-Llama3.1-70B*
Accuracy
95.6
36
1mo ago
LoCoMo
GAM
F1 Score
35.88
28
3d ago
MuSiQue
AoT
Accuracy
39.6
27
10d ago
2WikiMQA IRCoT 500 samples (test)
ActiShade
ACC
52.8
27
1mo ago
HotpotQA IRCoT (500 samples) (test)
ActiShade
ACC
54.6
27
1mo ago
MuSiQue IRCoT 500 samples (test)
ActiShade
ACC
25.59
27
1mo ago
2WikiMHQA
CoT-UQ
AUROC
0.7002
26
1mo ago
HotpotQA
CoT-UQ
AUROC
67.19
26
1mo ago
HotpotQA
CoT2-Meta
Accuracy
90.4
20
16d ago
TriviaQA
Prompt-R1
Exact Match (EM)
70.31
17
25d ago
RULER QA
Qwen2.5-14B-1M-LongRLVR
Accuracy (32K Context)
95.4
17
1mo ago
CommaQA-E compositional
ChatGPT (SKiC)
Exact Match
80.8
15
1mo ago
CommaQA-E (test)
ChatGPT (SKiC)
Exact Match
70
15
1mo ago
MuSiQue
CoT
Relative Cost
1
14
10d ago
HotpotQA
CoT
Relative Cost
1
14
10d ago
MultiHopRAG
Qwen2.5-OpAmp-72B
EM
89.6
11
1mo ago
WebQSP
KG-Reasoner
Hits@1
93.15
10
3d ago
CWQ
LMP
Hits@1
82.2
10
3d ago
2WikiMultihopQA
Prompt-R1
Exact Match (EM)
48.44
10
1mo ago
MuSR
Qwen3 8B
Accuracy
43.12
10
1mo ago
QASPER
MergeRAG-Sym
EM
15
7
25d ago
MuSiQue
MergeRAG-Asym
Exact Match (EM)
24
7
25d ago
LongBench MuSiQue and WikiMultiHopQA
MGRS
F1 Score
69.9
7
1mo ago
MultiHopRAG Average 1.0 (test)
UniAI-GraphRAG
Relevancy
64.47
4
22d ago
Showing 25 of 34 rows
25 / page
50 / page
100 / page
1
2
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs