Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Mathematical Word Problems on GSM8k (OOD)
Loading...
72.9
Accuracy
R1 Distill -> GRPO
21.42
34.785
48.15
61.515
Dec 3, 2025
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
R1 Distill -> GRPO
training=SFT + GRPO
2025.12
72.9
BOLT -> GRPO
training=SFT + GRPO
2025.12
69.7
STaR -> GRPO
training=SFT + GRPO
2025.12
68.6
SkillFactory -> GRPO
training=SFT + GRPO
2025.12
68.2
RL-Only
training=RL
2025.12
67.7
R1 Distill
training=SFT
2025.12
62.9
Qwen2.5 1.5B Instruct
training=None
2025.12
59.2
SkillFactory
training=SFT
2025.12
59.1
STaR
training=SFT
2025.12
31.1
BOLT
training=SFT
2025.12
23.4
Feedback
Search any
task
Search any
task