Share your thoughts, 1 month free Claude Pro on usSee more

Mathematical Reasoning on Countdown 4,5,6-arg held-out difficulties (test)

25.1Accuracy

SkillFactory -> GRPO

Updated 5mo ago

Evaluation Results

Method	Links
SkillFactory -> GRPO 2025.12		25.1
R1 Distill -> GRPO 2025.12		21.2
RL-Only 2025.12		15.8
BOLT -> GRPO 2025.12		13.7
R1 Distill 2025.12		11.7
STaR -> GRPO 2025.12		9.7
SkillFactory 2025.12		2.8
STaR 2025.12		2.6
Qwen2.5 1.5B Instruct 2025.12		1.9
BOLT 2025.12		0.5