Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Math Reasoning on AIME 2025 (Accuracy %)

94.6Accuracy

GPT-5 high

1.31225.53149.7573.969Jul 2, 2025Aug 22, 2025Oct 13, 2025Dec 4, 2025Jan 25, 2026Mar 18, 2026May 9, 2026
Updated 16d ago

Evaluation Results

MethodLinks
2025.11
94.6
2026.05
93.5
2026.05
90.9
2026.05
88.4
2025.11
88
2026.05
87.1
2025.11
87
2026.05
86.2
2026.05
85.5
2026.05
84.9
2026.05
84.1
2026.05
83.4
2026.05
83.1
2026.05
83.1
2026.05
83.1
2026.05
81.4
2026.05
80.7
2026.05
80.3
2026.05
80.2
2026.05
80
2026.05
77
2026.05
76.8
2026.05
76.8
2026.05
76.6
2026.05
57.2
2026.05
57
2026.05
56.1
2026.05
56
2026.05
55.2
2026.05
55.2
2026.05
55.1
2026.05
54.5
2026.05
53.3
2026.05
52.4
2025.11
36.7
2025.11
33.3
2025.07
26.4
2026.05
24.2
2025.07
23.1
2025.07
23.1
2026.05
23
2026.05
22.8
2026.05
22.4
2025.07
22.3
2025.07
20.1
2026.05
19.3
2025.07
16.5
2025.07
16.4
2026.05
15.9
2025.07
15.3
2026.05
13.3
2025.07
11.9
2026.05
11.8
2026.05
11.3
2026.05
10.8
2026.05
9.4
2026.05
9
2026.05
7.9
2025.07
6.8
4.9