Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Mathematical Reasoning on MATH 500 (p@1, p@16)
Loading...
98.35
P@1
ST-G
93.8884
95.0467
96.205
97.3633
Feb 2, 2026
P@1
P@16
Updated 3d ago
Evaluation Results
Method
Method
Links
P@1
P@16
ST-G
Model=Qwen3-30B-A3B-Th...
2026.02
98.35
99.6
DoLa
Model=Qwen3-30B-A3B-Th...
2026.02
98.31
99.6
CoT
Model=Qwen3-30B-A3B-Th...
2026.02
98.3
99.6
LED
Model=Qwen3-30B-A3B-Th...
2026.02
98.3
99.6
ST
Model=Qwen3-30B-A3B-Th...
2026.02
98.25
99
LED
Model=Qwen3-4B-Thinking
2026.02
97.92
99.4
CoT
Model=Qwen3-4B-Thinking
2026.02
97.86
99.2
DoLa
Model=Qwen3-4B-Thinking
2026.02
97.85
99.4
ST
Model=Qwen3-4B-Thinking
2026.02
97.65
98.6
ST-G
Model=Qwen3-4B-Thinking
2026.02
97.62
99.2
ST-G
Model=MiMo-7B-RL
2026.02
95.99
99
CoT
Model=MiMo-7B-RL
2026.02
95.35
98.8
LED
Model=MiMo-7B-RL
2026.02
95.35
99
DoLa
Model=MiMo-7B-RL
2026.02
95.07
98.8
ST
Model=MiMo-7B-RL
2026.02
94.06
98.4
Feedback
Search any
task
Search any
task