Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on AIME25 (pass@1, pass@16)
Loading...
21.6
Pass@1
EVOL-RL
3.92
8.51
13.1
17.69
Sep 18, 2025
Pass@1
Pass@16
Updated 1mo ago
Evaluation Results
Method
Method
Links
Pass@1
Pass@16
EVOL-RL
Base Model Size=8B, Tr...
2025.09
21.6
43.1
EVOL-RL
Base Model Size=8B, Tr...
2025.09
20.2
44.4
EVOL-RL
Base Model Size=4B, Tr...
2025.09
17.5
39.9
EVOL-RL
Base Model Size=4B, Tr...
2025.09
17.1
42
TTRL
Base Model Size=8B, Tr...
2025.09
16.5
34.3
EVOL-RL
Base Model Size=8B, Tr...
2025.09
16.5
34.7
EVOL-RL
Base Model Size=4B, Tr...
2025.09
16.1
41.9
TTRL
Base Model Size=8B, Tr...
2025.09
15.6
35.9
TTRL
Base Model Size=8B, Tr...
2025.09
11.4
25.4
Qwen3-8B-Base
Base Model Size=8B, Tr...
2025.09
8.2
30.8
TTRL
Base Model Size=4B, Tr...
2025.09
7.2
29.9
TTRL
Base Model Size=4B, Tr...
2025.09
6.8
28.6
Qwen3-4B-Base
Base Model Size=4B, Tr...
2025.09
5.5
30
TTRL
Base Model Size=4B, Tr...
2025.09
4.6
18.5
Feedback
Search any
task
Search any
task