Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

LLM Performance Estimation on GSM8K (test)

1.619MAE (%)

SparseEval

1.467282.491393.51554.53961Feb 8, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.02
1.6190.936
2026.02
1.7540.931
2026.02
1.960.925
2026.02
2.3210.908
2026.02
2.4240.887
2026.02
2.7740.88
2026.02
2.7780.906
2026.02
3.1610.871
2026.02
3.3050.872
2026.02
3.6310.916
2026.02
3.7560.878
2026.02
3.9840.832
2026.02
4.0030.9
2026.02
4.1570.885
2026.02
4.2030.912
2026.02
4.2710.892
2026.02
4.4330.844
2026.02
5.2750.802
2026.02
5.2950.842
2026.02
5.4120.833