Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Mathematical Reasoning on AIME 2024 (p@1 and p@16)
Loading...
87.92
p@1
LED
55.42
63.8575
72.295
80.7325
Feb 2, 2026
p@1
p@16
Updated 3d ago
Evaluation Results
Method
Method
Links
p@1
p@16
LED
Model=Qwen3-30B-A3B-Th...
2026.02
87.92
90
DoLa
Model=Qwen3-30B-A3B-Th...
2026.02
87.5
90
CoT
Model=Qwen3-30B-A3B-Th...
2026.02
87.29
90
ST-G
Model=Qwen3-30B-A3B-Th...
2026.02
86.46
90
ST
Model=Qwen3-30B-A3B-Th...
2026.02
85.62
90
ST
Model=Qwen3-4B-Thinking
2026.02
80
90
ST-G
Model=Qwen3-4B-Thinking
2026.02
79.37
90
LED
Model=Qwen3-4B-Thinking
2026.02
78.33
90
CoT
Model=Qwen3-4B-Thinking
2026.02
76.46
90
DoLa
Model=Qwen3-4B-Thinking
2026.02
76.25
90
DoLa
Model=MiMo-7B-RL
2026.02
67.08
83.33
LED
Model=MiMo-7B-RL
2026.02
66.46
83.33
ST-G
Model=MiMo-7B-RL
2026.02
65.62
83.33
CoT
Model=MiMo-7B-RL
2026.02
65.21
83.33
ST
Model=MiMo-7B-RL
2026.02
56.67
83.33
Feedback
Search any
task
Search any
task