Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Multi-hop Question Answering on 2WikiMultiHopQA (official evaluation)
Loading...
31.8
Exact Match (EM)
HRPO
10.272
15.861
21.45
27.039
Dec 1, 2025
Exact Match (EM)
Updated 4d ago
Evaluation Results
Method
Method
Links
Exact Match (EM)
HRPO
Backbone=Qwen2.5-3B-In...
2025.12
31.8
M3PO
Backbone=Qwen2.5-3B-In...
2025.12
31.4
GRPO
Backbone=Qwen2.5-3B-In...
2025.12
30.3
PPO
Backbone=Qwen2.5-3B-In...
2025.12
29.3
M3PO
Backbone=Qwen2.5-1.5B-...
2025.12
27.9
HRPO
Backbone=Qwen2.5-1.5B-...
2025.12
27.6
QA
Backbone=Qwen2.5-7B-In...
2025.12
25
SFT
Backbone=Qwen2.5-3B-In...
2025.12
24.8
PPO
Backbone=Qwen2.5-1.5B-...
2025.12
24.2
RAG
Backbone=Qwen2.5-7B-In...
2025.12
23.5
RAG
Backbone=Qwen2.5-3B-In...
2025.12
22.6
GRPO
Backbone=Qwen2.5-1.5B-...
2025.12
21.3
SFT
Backbone=Qwen2.5-1.5B-...
2025.12
21
RAG
Backbone=Qwen2.5-1.5B-...
2025.12
20.3
Search-o1
Backbone=Qwen2.5-7B-In...
2025.12
17.6
IRCoT
Backbone=Qwen2.5-7B-In...
2025.12
14.9
CoT
Backbone=Qwen2.5-7B-In...
2025.12
11.1
Feedback
Search any
task
Search any
task