Share your thoughts, 1 month free Claude Pro on usSee more

Arithmetic Reasoning on Long Multiplication 2,3,4,5-digit (OOD)

37.1Accuracy

R1 Distill -> GRPO

Updated 5mo ago

Evaluation Results

Method	Links
R1 Distill -> GRPO 2025.12		37.1
SkillFactory -> GRPO 2025.12		35
R1 Distill 2025.12		32.4
SkillFactory 2025.12		32.4
Qwen2.5 1.5B Instruct 2025.12		29.8
BOLT -> GRPO 2025.12		26.6
RL-Only 2025.12		24.4
STaR -> GRPO 2025.12		23.2
STaR 2025.12		22.1
BOLT 2025.12		15.1