Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on AIME24, AIME25, and AMC23 (test)
Loading...
86.67
AIME24 pass@32
CIPO
58.9332
66.1341
73.335
80.5359
May 14, 2026
AIME24 pass@32
AIME25 pass@32
AMC23 pass@32
Average Pass@32
Updated 19d ago
Evaluation Results
Method
Method
Links
AIME24 pass@32
AIME25 pass@32
AMC23 pass@32
Average Pass@32
CIPO
Backbone=Qwen3-4B
2026.05
86.67
70
100
85.56
GRPO
Backbone=Qwen3-4B, BS=128
2026.05
76.67
63.33
95
78.33
GRPO
Backbone=Qwen3-4B, BS=256
2026.05
73.33
70
95
79.44
GRPO
Backbone=Qwen3-4B, m=1...
2026.05
63.33
60
92.5
71.94
Initial
Backbone=Qwen3-4B
2026.05
60
53.33
97.5
70.28
Feedback
Search any
task
Search any
task