Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on AMC23 (avg@8)
Loading...
87.5
Avg@8 Score
GRPO w/ Structure Reward
82.6224
83.8887
85.155
86.4213
Mar 30, 2026
Avg@8 Score
Updated 2mo ago
Evaluation Results
Method
Method
Links
Avg@8 Score
GRPO w/ Structure Reward
RL Category=Label-free...
2026.03
87.5
PPO w/ Ground-Truth
RL Category=Ground-Tru...
2026.03
86.56
GRPO w/ Entropy Minimization (EMPO)
RL Category=Label-free...
2026.03
86.56
GRPO w/ Majority Voting (TTRL)
RL Category=Label-free...
2026.03
86.56
PPO w/ Structure Reward
RL Category=Label-free...
2026.03
85
GRPO w/ Ground-Truth
RL Category=Ground-Tru...
2026.03
84.38
Base
2026.03
82.81
Feedback
Search any
task
Search any
task