Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Mathematical Reasoning on AIME 2025 (p@1, p@16)
Loading...
82.08
P@1
DoLa
53.6984
61.0667
68.435
75.8033
Feb 2, 2026
P@1
P@16
Updated 3d ago
Evaluation Results
Method
Method
Links
P@1
P@16
DoLa
Model=Qwen3-30B-A3B-Th...
2026.02
82.08
93.33
CoT
Model=Qwen3-30B-A3B-Th...
2026.02
81.25
90
LED
Model=Qwen3-30B-A3B-Th...
2026.02
81.04
93.33
ST-G
Model=Qwen3-30B-A3B-Th...
2026.02
79.79
93.33
LED
Model=Qwen3-4B-Thinking
2026.02
76.46
90
ST
Model=Qwen3-30B-A3B-Th...
2026.02
75.21
90
CoT
Model=Qwen3-4B-Thinking
2026.02
74.17
90
ST
Model=Qwen3-4B-Thinking
2026.02
72.29
83.33
DoLa
Model=Qwen3-4B-Thinking
2026.02
72.08
86.67
ST-G
Model=Qwen3-4B-Thinking
2026.02
67.5
90
LED
Model=MiMo-7B-RL
2026.02
60.62
83.33
CoT
Model=MiMo-7B-RL
2026.02
59.17
80
DoLa
Model=MiMo-7B-RL
2026.02
59.17
80
ST-G
Model=MiMo-7B-RL
2026.02
59.17
80
ST
Model=MiMo-7B-RL
2026.02
54.79
80
Feedback
Search any
task
Search any
task