Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-Hop Question Answering on 2wiki 2018 Wikipedia dump (dev)
Loading...
43.6
Accuracy (%)
MR-Search
16.56
23.58
30.6
37.62
Mar 11, 2026
Accuracy (%)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy (%)
MR-Search
Backbone=Qwen2.5-7b
2026.03
43.6
StepResearch
Backbone=Qwen2.5-7b
2026.03
41.8
MR-Search
Backbone=Qwen2.5-3b
2026.03
40.1
Search-R1
Backbone=Qwen2.5-7b
2026.03
38.7
ReSearch
Backbone=Qwen2.5-7b
2026.03
38.6
PPRM
Backbone=Qwen2.5-7b
2026.03
35.5
PPRM
Backbone=Qwen2.5-3b
2026.03
34
StepResearch
Backbone=Qwen2.5-3b
2026.03
33.8
Search-R1
Backbone=Qwen2.5-3b
2026.03
31
ReSearch
Backbone=Qwen2.5-3b
2026.03
27.2
Direct Inference
Backbone=Qwen2.5-7b
2026.03
25
Direct Inference
Backbone=Qwen2.5-3b
2026.03
24.4
Search-o1
Backbone=Qwen2.5-3b
2026.03
21.8
Search-o1
Backbone=Qwen2.5-7b
2026.03
17.6
Feedback
Search any
task
Search any
task