Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Math Reasoning on Average (MATH, GSM8K, AIME 2025)

0.92Task Accuracy

GPT-OSS-20B

0.50920.615850.72250.82915Feb 10, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.02
0.920.640.580.46
2026.02
0.9140.70.630.55
2026.02
0.8660.780.680.62
2026.02
0.8270.80.720.66
2026.02
0.8090.790.680.67
2026.02
0.7630.760.630.66
2026.02
0.7240.840.640.61
2026.02
0.5830.850.650.69
2026.02
0.5250.680.630.73