Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Mathematical Reasoning on GSM8K (test) (Top@1, Time, Tokens)

93.33Top-1 Accuracy

Full Reasoning

62.816470.738278.6686.5818Jan 7, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.01
93.3335.24493
2026.01
93.0331.59442
2026.01
92.2333.49381
2026.01
92.1231.17459
2026.01
92.0418.07251
2026.01
91.6227.09379
2026.01
91.1522.59319
2026.01
88.0911.39266
2026.01
87.8215.28278
2026.01
87.5718.79440
2026.01
87.4914.39383
2026.01
86.4317.05417
2026.01
85.9715.22359
2026.01
85.5213.45317
2026.01
84.43156.582,191
2026.01
80.5112.72267
2026.01
78.4723.66498
2026.01
78.3219.85419
2026.01
78.1815.33385
2026.01
77.4521.16486
2026.01
77.414.53106
2026.01
77.0418.49403
2026.01
76.6315.52340
2026.01
63.993.9984