Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Multi-Hop Question Answering on MuSiQue (out-of-domain)
Loading...
22.6
Accuracy
StepSearch
4.088
8.894
13.7
18.506
Feb 9, 2026
Accuracy
Updated 2d ago
Evaluation Results
Method
Method
Links
Accuracy
StepSearch
Backbone=Qwen2.5-7B-In...
2026.02
22.6
SKILLRL
Backbone=Qwen2.5-7B-In...
2026.02
20.2
ZeroSearch
Backbone=Qwen2.5-7B-In...
2026.02
18.4
EvolveR
Backbone=Qwen2.5-7B-In...
2026.02
15.6
Search-R1
Backbone=Qwen2.5-7B-In...
2026.02
14.6
RAG
Backbone=Qwen2.5-7B-In...
2026.02
9.4
Search-o1
Backbone=Qwen2.5-7B-In...
2026.02
8.6
CoT
Backbone=Qwen2.5-7B-In...
2026.02
6.6
R1-Instruct
Backbone=Qwen2.5-7B-In...
2026.02
6
Qwen2.5
Backbone=Qwen2.5-7B-In...
2026.02
4.8
Feedback
Search any
task
Search any
task