Share your thoughts, 1 month free Claude Pro on usSee more

Mathematical Reasoning on Math held-out task instances (test)

20.4Accuracy

Full ExIt

Updated 5mo ago

Evaluation Results

Method	Links
Full ExIt 2025.09		20.4	2
Diverge (ExIt ablation) 2025.09		20.1	1.6
Improve (ExIt ablation) 2025.09		19.6	1.2
GRPO + curriculum 2025.09		18.8	0.9
GRPO 2025.09		18.7	1.1
Base model 2025.09		17.4	-0.4