Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on AIME 25 (avg@16)
Loading...
25.3
Avg@16
Scaf-GRPO
-1.012
5.819
12.65
19.481
Oct 22, 2025
Avg@16
Updated 1mo ago
Evaluation Results
Method
Method
Links
Avg@16
Scaf-GRPO
Backbone=DeepSeek-R1-D...
2025.10
25.3
Vanilla GRPO
Backbone=DeepSeek-R1-D...
2025.10
22.9
Scaf-GRPO
Backbone=Qwen2.5-Math-7B
2025.10
14.6
LUFFY
Backbone=Qwen2.5-Math-7B
2025.10
12
Oat-Zero
Backbone=Qwen2.5-Math-7B
2025.10
11.5
SimpleRL-Zero
Backbone=Qwen2.5-Math-7B
2025.10
11
Scaf-GRPO
Backbone=Qwen2.5-7B
2025.10
11
Scaf-GRPO
Backbone=Qwen2.5-Math-...
2025.10
10.8
Vanilla GRPO
Backbone=Qwen2.5-Math-7B
2025.10
10.8
Vanilla GRPO
Backbone=Qwen2.5-7B
2025.10
9.5
Vanilla GRPO
Backbone=Qwen2.5-Math-...
2025.10
8.2
Scaf-GRPO
Backbone=Llama-3.2-3B-...
2025.10
0.3
Vanilla GRPO
Backbone=Llama-3.2-3B-...
2025.10
0
Feedback
Search any
task
Search any
task