Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Character Reasoning on Letter CD (4,5) (OOD)
Loading...
14.4
Accuracy
R1 Distill -> GRPO
5.144
7.547
9.95
12.353
Dec 3, 2025
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
R1 Distill -> GRPO
training=SFT + GRPO
2025.12
14.4
BOLT -> GRPO
training=SFT + GRPO
2025.12
13.1
SkillFactory -> GRPO
training=SFT + GRPO
2025.12
12.8
RL-Only
training=RL
2025.12
12.5
Qwen2.5 1.5B Instruct
training=None
2025.12
10.4
STaR -> GRPO
training=SFT + GRPO
2025.12
9.2
R1 Distill
training=SFT
2025.12
8.8
SkillFactory
training=SFT
2025.12
8.7
STaR
training=SFT
2025.12
7.3
BOLT
training=SFT
2025.12
5.5
Feedback
Search any
task
Search any
task