Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on AIME 24 (pass@1, pass@16)
Loading...
26
Pass@1
EVOL-RL
9.36
13.68
18
22.32
Sep 18, 2025
Pass@1
Pass@16
Updated 1mo ago
Evaluation Results
Method
Method
Links
Pass@1
Pass@16
EVOL-RL
Base Model Size=8B, Tr...
2025.09
26
51.7
EVOL-RL
Base Model Size=8B, Tr...
2025.09
25.4
38.1
EVOL-RL
Base Model Size=8B, Tr...
2025.09
24.1
49.5
EVOL-RL
Base Model Size=4B, Tr...
2025.09
20.7
47.6
EVOL-RL
Base Model Size=4B, Tr...
2025.09
20.6
40.9
TTRL
Base Model Size=8B, Tr...
2025.09
20
20
EVOL-RL
Base Model Size=4B, Tr...
2025.09
19
43.2
TTRL
Base Model Size=8B, Tr...
2025.09
17.7
40.1
TTRL
Base Model Size=4B, Tr...
2025.09
16.7
16.7
TTRL
Base Model Size=8B, Tr...
2025.09
16.7
37.6
TTRL
Base Model Size=4B, Tr...
2025.09
12.1
23.2
Qwen3-8B-Base
Base Model Size=8B, Tr...
2025.09
12
39.4
Qwen3-4B-Base
Base Model Size=4B, Tr...
2025.09
10
32.4
TTRL
Base Model Size=4B, Tr...
2025.09
10
28
Feedback
Search any
task
Search any
task