Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-Hop Question Answering on Musique 2018 Wikipedia dump (dev)
Loading...
22.1
Accuracy
MR-Search
1.196
6.623
12.05
17.477
Mar 11, 2026
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
MR-Search
Backbone=Qwen2.5-7b
2026.03
22.1
StepResearch
Backbone=Qwen2.5-7b
2026.03
20.5
Search-R1
Backbone=Qwen2.5-7b
2026.03
18.1
ReSearch
Backbone=Qwen2.5-7b
2026.03
16.6
MR-Search
Backbone=Qwen2.5-3b
2026.03
16.5
PPRM
Backbone=Qwen2.5-7b
2026.03
14.7
PPRM
Backbone=Qwen2.5-3b
2026.03
12.7
StepResearch
Backbone=Qwen2.5-3b
2026.03
10.5
Search-R1
Backbone=Qwen2.5-3b
2026.03
7.7
ReSearch
Backbone=Qwen2.5-3b
2026.03
7.4
Search-o1
Backbone=Qwen2.5-7b
2026.03
5.8
Search-o1
Backbone=Qwen2.5-3b
2026.03
5.4
Direct Inference
Backbone=Qwen2.5-7b
2026.03
3.1
Direct Inference
Backbone=Qwen2.5-3b
2026.03
2
Feedback
Search any
task
Search any
task