Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Arithmetic Reasoning on Long Multiplication 2,3,4,5-digit (OOD)
Loading...
37.1
Accuracy
R1 Distill -> GRPO
14.22
20.16
26.1
32.04
Dec 3, 2025
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
R1 Distill -> GRPO
training=SFT + GRPO
2025.12
37.1
SkillFactory -> GRPO
training=SFT + GRPO
2025.12
35
R1 Distill
training=SFT
2025.12
32.4
SkillFactory
training=SFT
2025.12
32.4
Qwen2.5 1.5B Instruct
training=None
2025.12
29.8
BOLT -> GRPO
training=SFT + GRPO
2025.12
26.6
RL-Only
training=RL
2025.12
24.4
STaR -> GRPO
training=SFT + GRPO
2025.12
23.2
STaR
training=SFT
2025.12
22.1
BOLT
training=SFT
2025.12
15.1
Feedback
Search any
task
Search any
task