Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Mathematical Reasoning on AIME 2025 (avg@10)
Loading...
13.67
Avg@10
SCR (Ours)
-0.5468
3.1441
6.835
10.5259
Jan 12, 2026
Avg@10
Updated 4d ago
Evaluation Results
Method
Method
Links
Avg@10
SCR (Ours)
Backbone=Qwen2.5-7B-In...
2026.01
13.67
SFT + GRPO
Backbone=Qwen2.5-7B-In...
2026.01
11.67
GRPO
Backbone=Qwen2.5-7B-In...
2026.01
10.33
SCR-Stage I
Backbone=Qwen2.5-7B-In...
2026.01
9
Base
Backbone=Qwen2.5-7B-In...
2026.01
6
Self-Refine
Backbone=Qwen2.5-7B-In...
2026.01
5.67
GRPO
Backbone=Qwen2.5-3B-In...
2026.01
5
SCR (Ours)
Backbone=Llama3.1-8B-I...
2026.01
4.67
SCR-SFT
Backbone=Qwen2.5-7B-In...
2026.01
4
SCR (Ours)
Backbone=Qwen2.5-3B-In...
2026.01
4
SFT + GRPO
Backbone=Qwen2.5-3B-In...
2026.01
3.67
Self-Refine
Backbone=Qwen2.5-3B-In...
2026.01
3
Base
Backbone=Qwen2.5-3B-In...
2026.01
2.33
SCR-SFT
Backbone=Qwen2.5-3B-In...
2026.01
2.33
SCR-Stage I
Backbone=Qwen2.5-3B-In...
2026.01
2
SCR-Stage I
Backbone=Llama3.1-8B-I...
2026.01
1.67
SCR-SFT
Backbone=Llama3.1-8B-I...
2026.01
1.33
GRPO
Backbone=Llama3.1-8B-I...
2026.01
1
SFT + GRPO
Backbone=Llama3.1-8B-I...
2026.01
0.67
Base
Backbone=Llama3.1-8B-I...
2026.01
0
Self-Refine
Backbone=Llama3.1-8B-I...
2026.01
0
Feedback
Search any
task
Search any
task