Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematics on AIME 25 (pass@1 (%))
Loading...
23.3
Pass@1
Scaf-GRPO
-0.932
5.359
11.65
17.941
Oct 22, 2025
Pass@1
Updated 1mo ago
Evaluation Results
Method
Method
Links
Pass@1
Scaf-GRPO
Base Model=DeepSeek-R1...
2025.10
23.3
Vanilla GRPO
Base Model=DeepSeek-R1...
2025.10
21.1
Scaf-GRPO
Base Model=Qwen2.5-Mat...
2025.10
20
Scaf-GRPO
Base Model=Qwen2.5-7B
2025.10
20
DeepSeek-R1-Distill-Qwen-1.5B
Base Model=DeepSeek-R1...
2025.10
20
Oat-Zero
Base Model=Qwen2.5-Mat...
2025.10
16.7
LUFFY
Base Model=Qwen2.5-Mat...
2025.10
16.7
Scaf-GRPO
Base Model=Qwen2.5-Mat...
2025.10
13.3
Qwen2.5-Math-7B
Base Model=Qwen2.5-Mat...
2025.10
13.3
Vanilla GRPO
Base Model=Qwen2.5-Mat...
2025.10
13.3
SimpleRL-Zero
Base Model=Qwen2.5-Mat...
2025.10
13.3
Vanilla GRPO
Base Model=Qwen2.5-Mat...
2025.10
10
Vanilla GRPO
Base Model=Qwen2.5-7B
2025.10
10
Qwen2.5-7B
Base Model=Qwen2.5-7B
2025.10
6.7
Qwen2.5-Math-1.5B
Base Model=Qwen2.5-Mat...
2025.10
3.3
Scaf-GRPO
Base Model=Llama-3.2-3...
2025.10
3.3
Llama-3.2-3B-Instruct
Base Model=Llama-3.2-3...
2025.10
0
Vanilla GRPO
Base Model=Llama-3.2-3...
2025.10
0
Feedback
Search any
task
Search any
task