Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning on AIME25, AMC23, MATH500, Minerva Aggregate

72.16Average Score

GRPO w/ Structure Reward

64.20466.269568.33570.4005Mar 30, 2026
Updated 2mo ago

Evaluation Results

MethodLinks
2026.03
72.167.65
2026.03
71.667.15
2026.03
71.456.94
2026.03
71.126.61
2026.03
70.385.87
2026.03
70.185.67
2026.03
64.51-