Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on AIME (avg@16)
Loading...
18.13
Average Score (@16)
GRPO-SG
4.038
7.6965
11.355
15.0135
Oct 29, 2025
Average Score (@16)
Updated 20d ago
Evaluation Results
Method
Method
Links
Average Score (@16)
GRPO-SG
Base Model=Qwen2.5-7B,...
2025.10
18.13
GRPO-SG
Base Model=Qwen2.5-7B,...
2025.10
16.88
80/20
Base Model=Qwen2.5-7B,...
2025.10
16.4
GRPO
Base Model=Qwen2.5-7B,...
2025.10
15.63
AR
Base Model=Qwen2.5-7B,...
2025.10
15.52
Lopti
Base Model=Qwen2.5-7B,...
2025.10
15.45
Lopti
Base Model=Qwen2.5-7B,...
2025.10
15.27
80/20
Base Model=Qwen2.5-7B,...
2025.10
14.94
GRPO
Base Model=Qwen2.5-7B,...
2025.10
14.79
AR
Base Model=Qwen2.5-7B,...
2025.10
14.72
Qwen2.5-7B
Base Model=Qwen2.5-7B
2025.10
4.58
Feedback
Search any
task
Search any
task