Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Multi-Hop Question Answering on HotpotQA (in-domain) (Accuracy)
Loading...
43.2
Accuracy
SKILLRL
15.12
22.41
29.7
36.99
Feb 9, 2026
Accuracy
Updated 2d ago
Evaluation Results
Method
Method
Links
Accuracy
SKILLRL
Backbone=Qwen2.5-7B-In...
2026.02
43.2
StepSearch
Backbone=Qwen2.5-7B-In...
2026.02
38.6
EvolveR
Backbone=Qwen2.5-7B-In...
2026.02
38.2
Search-R1
Backbone=Qwen2.5-7B-In...
2026.02
37
ZeroSearch
Backbone=Qwen2.5-7B-In...
2026.02
34.6
RAG
Backbone=Qwen2.5-7B-In...
2026.02
25.8
R1-Instruct
Backbone=Qwen2.5-7B-In...
2026.02
20.8
Search-o1
Backbone=Qwen2.5-7B-In...
2026.02
17
Qwen2.5
Backbone=Qwen2.5-7B-In...
2026.02
16.4
CoT
Backbone=Qwen2.5-7B-In...
2026.02
16.2
Feedback
Search any
task
Search any
task