Share your thoughts, 1 month free Claude Pro on usSee more

Math Problem Solving on AIME 2025 (test)

30Accuracy

SelfBudgeter

Updated 3mo ago

Evaluation Results

Method	Links
SelfBudgeter 2025.05		30	12,241.84
DeepSeek-R1-Distill-Qwen 2025.05		28.89	22,158.79
DeepSeek-R1-Distill-Qwen 2025.05		22.22	14,444.03
E1-Math-1.5B 2025.05		21.11	5,578.13
SelfBudgeter 2025.05		21.11	4,288.1
L1-Max 2025.05		17.88	5,213.89
Eurus-2-7B-PRIME 2025.05		15.56	1,254.52
Qwen-2.5-7B-Simple-RL 2025.05		6.67	1,429.94
E1-Math-1.5B 2025.05		4.44	3,008.44