Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Mathematical Reasoning on GSM8K (Pass@1, Pass@2)
Loading...
95.72
Pass@1
Phi-4-mini + Mistral3-3B
69.9592
76.6471
83.335
90.0229
Jan 29, 2026
Pass@1
Pass@2
Updated 3d ago
Evaluation Results
Method
Method
Links
Pass@1
Pass@2
Phi-4-mini + Mistral3-3B
Training Framework=COR...
2026.01
95.72
96.72
Phi-4-mini-reasoning
Training Framework=COR...
2026.01
90.14
91.1
Mistral3-3B-Reasoning
Training Framework=COR...
2026.01
89.25
90.23
Phi-4-mini + Mistral3-3B + Oracle
Training Framework=SD-...
2026.01
83.25
85.5
Phi-4-mini-reasoning
Training Framework=SD-...
2026.01
80.6
84.35
Mistral3-3B-Reasoning
Training Framework=SD-...
2026.01
79.1
83.4
Phi-4-mini + Mistral3-3B + Oracle
Training Framework=Bas...
2026.01
74.85
80.2
Phi-4-mini-reasoning
Training Framework=Bas...
2026.01
72.4
78.1
Mistral3-3B-Reasoning
Training Framework=Bas...
2026.01
70.95
76.7
Feedback
Search any
task
Search any
task