Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Mathematical Reasoning on AIME25 (Accuracy, Efficiency k)
Loading...
46.7
Accuracy (%)
Qwen3-4B-Inst-2507
-1.868
10.741
23.35
35.959
Feb 5, 2026
Accuracy (%)
Efficiency (k)
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy (%)
Efficiency (k)
Qwen3-4B-Inst-2507
Chat Template=Off, Dec...
2026.02
46.7
1
Qwen3-4B-Inst-2507
Chat Template=Off, Dec...
2026.02
26.7
2.3
Qwen3-4B-Inst-2507
Chat Template=Off, Dec...
2026.02
23.3
1
Qwen3-4B-Inst-2507
Chat Template=Off, Dec...
2026.02
23.3
2.8
L3.1-8B-Magpie
Chat Template=On, Deco...
2026.02
0
1
L3.1-8B-Magpie
Chat Template=On, Deco...
2026.02
0
1
L3.1-8B-Magpie
Chat Template=On, Deco...
2026.02
0
5.5
Feedback
Search any
task
Search any
task