Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Mathematical Reasoning on GSM8K (Accuracy, Avg., Drop ↓)

94.49Accuracy

BF16 Baseline

-3.779621.732747.24572.7573Jan 21, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.01
94.4970.63-
2026.01
92.4260.78-9.85
2026.01
92.1258.28-12.35
2026.01
84.6148.72-
2026.01
7941.1-
2026.01
78.6238.39-10.33
2026.01
77.9441.31-7.41
2026.01
48.6221.44-19.66
2026.01
33.0617.34-23.76
2026.01
0.762.11-46.61
2026.01
04.84-36.26