Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning on GSM8K (PEEM & Overall Scores)

95.3Accuracy

GPT-4o-mini

54.68865.231575.77586.3185Mar 11, 2026Mar 20, 2026Mar 29, 2026Apr 7, 2026Apr 16, 2026Apr 25, 2026May 5, 2026
Updated 22d ago

Evaluation Results

MethodLinks
2026.03
95.34.8824.8474.701
2026.03
94.14.814.8554.712
2026.03
864.6544.7234.506
2026.03
83.64.484.6464.405
2026.05
58.61---
2026.03
57.14.3544.5824.125
2026.05
56.25---