Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on AIME24 (pass@8)
Loading...
83.33
Pass@8
GRPO
55.6036
62.8018
70
77.1982
May 9, 2026
Pass@8
Updated 22d ago
Evaluation Results
Method
Method
Links
Pass@8
GRPO
2026.05
83.33
OPSD
2026.05
80
OPHSD
harness=plan-solve
2026.05
80
OPHSD
harness=none
2026.05
79.17
CRISP
2026.05
56.67
Feedback
Search any
task
Search any
task