Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning on GSM8K (PEEM & Overall Scores)

95.3Accuracy

GPT-4o-mini

55.57265.88676.286.514Mar 11, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.03
95.34.8824.8474.701
2026.03
94.14.814.8554.712
2026.03
864.6544.7234.506
2026.03
83.64.484.6464.405
2026.03
57.14.3544.5824.125