Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Mathematical Reasoning on GSM8K (Acc & NLDD)
Loading...
100
Accuracy
Llama-3.1-8B
95
97.5
100
102.5
Feb 4, 2026
Accuracy
NLDD
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
NLDD
Llama-3.1-8B
Regime=Faithful Regime...
2026.02
100
96.7
DeepSeek-Coder-6.7B
Regime=Faithful Regime...
2026.02
100
96.1
Gemma-2-9B
Regime=Anti-Faithful R...
2026.02
100
61.5
Feedback
Search any
task
Search any
task