Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on AIME 2025 (Pass@1, TC)
Loading...
30
Pass@1
ARPO
5.768
12.059
18.35
24.641
Jun 1, 2026
Pass@1
TC
Updated 1d ago
Evaluation Results
Method
Method
Links
Pass@1
TC
ARPO
Backbone Model=Qwen2.5...
2026.06
30
1.03
EAPO
Backbone Model=Qwen2.5...
2026.06
26.6
0.93
GRPO
Backbone Model=Qwen2.5...
2026.06
23.3
0.83
Reinforce++
Backbone Model=Qwen2.5...
2026.06
23.3
1.03
AEPO
Backbone Model=Qwen2.5...
2026.06
23.3
1.21
ToolStar
Backbone Model=Qwen2.5...
2026.06
20
1.4
Reinforce++
Backbone Model=Qwen2.5...
2026.06
16.7
1.3
ARPO
Backbone Model=Qwen2.5...
2026.06
16.7
1.3
GRPO
Backbone Model=Qwen2.5...
2026.06
13.3
1.22
ToolStar
Backbone Model=Qwen2.5...
2026.06
13.3
1.11
EAPO
Backbone Model=Qwen2.5...
2026.06
13.3
0.89
Base
Backbone Model=Qwen2.5...
2026.06
10
-
TIR
Backbone Model=Qwen2.5...
2026.06
10
0.16
Base
Backbone Model=Qwen2.5...
2026.06
6.7
-
TIR
Backbone Model=Qwen2.5...
2026.06
6.7
0.2
Feedback
Search any
task
Search any
task