Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on AIME25, AMC23, MATH500, Minerva Aggregate
Loading...
72.16
Average Score
GRPO w/ Structure Reward
64.204
66.2695
68.335
70.4005
Mar 30, 2026
Average Score
Improvement
Updated 2mo ago
Evaluation Results
Method
Method
Links
Average Score
Improvement
GRPO w/ Structure Reward
RL Category=Label-free...
2026.03
72.16
7.65
GRPO w/ Ground-Truth
RL Category=Ground-Tru...
2026.03
71.66
7.15
GRPO w/ Entropy Minimization (EMPO)
RL Category=Label-free...
2026.03
71.45
6.94
GRPO w/ Majority Voting (TTRL)
RL Category=Label-free...
2026.03
71.12
6.61
PPO w/ Structure Reward
RL Category=Label-free...
2026.03
70.38
5.87
PPO w/ Ground-Truth
RL Category=Ground-Tru...
2026.03
70.18
5.67
Base
2026.03
64.51
-
Feedback
Search any
task
Search any
task