Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-Hop Question Answering on HotpotQA 2018 Wikipedia dump (dev)
Loading...
46.8
Accuracy
MR-Search
13.624
22.237
30.85
39.463
Mar 11, 2026
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
MR-Search
Backbone=Qwen2.5-7b
2026.03
46.8
Search-R1
Backbone=Qwen2.5-7b
2026.03
43.9
StepResearch
Backbone=Qwen2.5-7b
2026.03
43.9
MR-Search
Backbone=Qwen2.5-3b
2026.03
41.9
PPRM
Backbone=Qwen2.5-7b
2026.03
38.6
ReSearch
Backbone=Qwen2.5-7b
2026.03
37.8
StepResearch
Backbone=Qwen2.5-3b
2026.03
37.3
PPRM
Backbone=Qwen2.5-3b
2026.03
35.3
Search-R1
Backbone=Qwen2.5-3b
2026.03
32.6
ReSearch
Backbone=Qwen2.5-3b
2026.03
30.5
Search-o1
Backbone=Qwen2.5-3b
2026.03
22.1
Search-o1
Backbone=Qwen2.5-7b
2026.03
18.7
Direct Inference
Backbone=Qwen2.5-7b
2026.03
18.3
Direct Inference
Backbone=Qwen2.5-3b
2026.03
14.9
Feedback
Search any
task
Search any
task