Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on AIME 2024 (Pass@1, TC)
Loading...
30
Pass@1 Accuracy
AEPO
2.232
9.441
16.65
23.859
Jun 1, 2026
Pass@1 Accuracy
TC
Updated 1d ago
Evaluation Results
Method
Method
Links
Pass@1 Accuracy
TC
AEPO
Backbone Model=Qwen2.5...
2026.06
30
1.2
EAPO
Backbone Model=Qwen2.5...
2026.06
30
1
Reinforce++
Backbone Model=Qwen2.5...
2026.06
26.6
1.1
ARPO
Backbone Model=Qwen2.5...
2026.06
26.6
1.16
ARPO
Backbone Model=Qwen2.5...
2026.06
23.3
1.1
EAPO
Backbone Model=Qwen2.5...
2026.06
23.3
0.9
GRPO
Backbone Model=Qwen2.5...
2026.06
23.3
0.93
ToolStar
Backbone Model=Qwen2.5...
2026.06
23.3
1.86
GRPO
Backbone Model=Qwen2.5...
2026.06
13.3
1.1
Reinforce++
Backbone Model=Qwen2.5...
2026.06
13.3
1.21
ToolStar
Backbone Model=Qwen2.5...
2026.06
13.3
1.03
Base
Backbone Model=Qwen2.5...
2026.06
10
-
TIR
Backbone Model=Qwen2.5...
2026.06
6.7
0.2
TIR
Backbone Model=Qwen2.5...
2026.06
6.7
0.2
Base
Backbone Model=Qwen2.5...
2026.06
3.3
-
Feedback
Search any
task
Search any
task