Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Math Reasoning on GSM8K

100Accuracy (GSM8K)

GPT-5 high

33.845651.020368.19585.3697Oct 28, 2025Dec 2, 2025Jan 6, 2026Feb 10, 2026Mar 17, 2026Apr 21, 2026May 27, 2026
Updated 1d ago

Evaluation Results

MethodLinks
2025.11
100-
2025.11
100-
2025.11
98.9-
2025.11
98.9-
2026.04
96.8-
2026.04
96.5-
2026.04
96.5-
2026.04
95.6-
2026.04
95-
2026.04
94.5-
2026.04
93.7-
2026.05
88.93-
2026.05
88.79-
2026.05
88.63-
2026.05
86.96-
2026.05
86.54-
2026.05
85.67-
2026.05
84.91-
2026.05
84.76-
2026.05
83.92-
2026.05
83.39-
2026.05
82.34-
2026.05
81.5-
2026.05
80.44-
2026.04
78.09-
2026.04
77.31-
76.88-
2026.04
76.32-
2026.04
75.92-
2026.04
75.89-
2025.11
75.7-
2026.04
74.37-
72.33-
2026.04
71-
2026.04
68.2-
2026.04
68.1-
2025.10
66.4-
2026.04
66.3-
2026.04
66.2-
2025.10
65.8-
2025.10
65.5-
2025.10
64.6-
2025.10
64.3-
2026.04
62.5-
2026.03
61.256.3
2026.04
60.8-
2026.04
60.7-
2026.03
59.855.5
2026.03
59.755.3
2026.03
5953.4
2026.04
58.9-
2026.03
57.953
2026.03
57.852.3
2026.03
57.749.3
2026.04
57.22-
2026.03
57.252.6
2026.03
5750.2
2026.04
56.8-
2026.04
56.52-
2026.05
56.18-
2026.03
56.150.1
2026.03
56.148.1
2026.03
5647.3
2026.04
56-
2026.04
55.5-
2026.04
55.34-
2025.10
55-
2026.04
54.6-
2025.10
53.7-
2025.10
53.2-
2025.10
53.1-
2026.03
52.950.4
2025.10
52.9-
2026.04
51.9-
2026.04
51.6-
2026.03
51.441.6
2026.04
51-
2026.03
50.845.3
2026.03
50.142.5
49.89-
2026.03
49.544.3
2026.03
49.337.7
2026.04
49.2-
49.13-
48.6-
2026.04
47.8-
2026.05
47.16-
2026.05
43.67-
2026.04
43.4-
2026.05
41.7-
2026.04
40.9-
2026.03
40.646.8
2026.04
40-
2026.04
39.4-
2026.05
39.27-
2026.05
38.29-
2026.04
37.5-
2026.03
37.243
2026.04
37.2-
36.39-
Showing 100 of 131 rows