Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Mathematical Reasoning on AIME (Pass@1 Accuracy, Length Exceeding Ratio)
Loading...
63.7
Pass@1 Accuracy
-
21.476
32.438
43.4
54.362
Jan 8, 2026
Pass@1 Accuracy
Length Exceeding Ratio
Updated 3d ago
Evaluation Results
Method
Method
Links
Pass@1 Accuracy
Length Exceeding Ratio
-
Model=Qwen3-4B-Instruct
2026.01
63.7
71.3
GDPO
Model=Qwen3-4B-Instruct
2026.01
56.9
0.1
-
Model=DeepSeek-R1-7B
2026.01
55.4
85.6
GRPO
Model=Qwen3-4B-Instruct
2026.01
54.6
2.5
GDPO
Model=DeepSeek-R1-7B
2026.01
53.1
0.2
GRPO
Model=DeepSeek-R1-7B
2026.01
50.2
2.1
-
Model=DeepSeek-R1-1.5B
2026.01
29.8
91.5
GDPO
Model=DeepSeek-R1-1.5B
2026.01
29.4
6.5
GRPO
Model=DeepSeek-R1-1.5B
2026.01
23.1
10.8
Feedback
Search any
task
Search any
task