Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Math Reasoning on GSM8K

100Accuracy (GSM8K)

GPT-5 high

15.44837.39959.3581.301Oct 28, 2025Nov 23, 2025Dec 19, 2025Jan 14, 2026Feb 9, 2026Mar 7, 2026Apr 2, 2026
Updated 9d ago

Evaluation Results

MethodLinks
2025.11
100-
2025.11
100-
2025.11
98.9-
2025.11
98.9-
2026.04
78.09-
2026.04
77.31-
2026.04
76.32-
2026.04
75.92-
2026.04
75.89-
2025.11
75.7-
2026.04
74.37-
2025.10
66.4-
2025.10
65.8-
2025.10
65.5-
2025.10
64.6-
2025.10
64.3-
2026.03
61.256.3
2026.04
60.8-
2026.03
59.855.5
2026.03
59.755.3
2026.03
5953.4
2026.03
57.953
2026.03
57.852.3
2026.03
57.749.3
2026.04
57.22-
2026.03
57.252.6
2026.03
5750.2
2026.04
56.52-
2026.03
56.150.1
2026.03
56.148.1
2026.03
5647.3
2026.04
55.5-
2026.04
55.34-
2025.10
55-
2025.10
53.7-
2025.10
53.2-
2025.10
53.1-
2026.03
52.950.4
2025.10
52.9-
2026.03
51.441.6
2026.03
50.845.3
2026.03
50.142.5
2026.03
49.544.3
2026.03
49.337.7
2026.03
40.646.8
2026.03
37.243
2026.03
31.238.4
2026.03
26.234
2026.03
18.729.9