Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Word Reasoning on Acronym (4,5) (OOD)
Loading...
12.3
Accuracy
BOLT -> GRPO
2.628
5.139
7.65
10.161
Dec 3, 2025
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
BOLT -> GRPO
base_model=Qwen2.5-1.5...
2025.12
12.3
SkillFactory -> GRPO
base_model=Qwen2.5-1.5...
2025.12
12.1
STaR -> GRPO
base_model=Qwen2.5-1.5...
2025.12
9.8
R1 Distill
base_model=Qwen2.5-1.5...
2025.12
9.4
RL-Only
base_model=Qwen2.5-1.5...
2025.12
8.7
Qwen2.5 1.5B Instruct
base_model=Qwen2.5-1.5...
2025.12
6.9
BOLT
base_model=Qwen2.5-1.5...
2025.12
6.2
R1 Distill -> GRPO
base_model=Qwen2.5-1.5...
2025.12
6
STaR
base_model=Qwen2.5-1.5...
2025.12
4
SkillFactory
base_model=Qwen2.5-1.5...
2025.12
3
Feedback
Search any
task
Search any
task