Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning on AIME 2025 (test)

88.9Pass@1 Rate

OpenAI-o3

11.00431.22751.4571.673Sep 26, 2025Oct 21, 2025Nov 16, 2025Dec 11, 2025Jan 6, 2026Jan 31, 2026Feb 26, 2026
Updated 6d ago

Evaluation Results

MethodLinks
2026.01
88.9----
2026.01
87.5----
2026.01
85.6----
2026.01
84.2----
2026.01
84----
2026.01
83----
2026.01
81.5----
2026.01
74.4----
2026.01
73.5----
2026.01
73.3----
2026.02
70----
2026.01
70----
2026.01
69.5----
2026.02
68.33----
2026.01
67.4----
2026.02
62.5----
2026.01
60----
2026.02
56.66----
2026.02
53.3----
2026.02
52.5----
2026.02
50----
2026.01
49.6----
2026.02
46.67----
2026.02
40.83----
2026.02
40.66----
2026.02
38----
2025.09
37.4-56.29--
2026.01
36.6643---
2026.01
35.3354.3---
2025.09
34.9-57.92--
2025.09
33.44-51.62--
2025.09
33.33-45.86--
2026.01
33.3610.7---
2026.01
33.3633.1---
2025.09
33.02-52.27--
2025.09
32.71-56.66--
2026.01
32.6634.7---
2025.09
32.5-48.01--
2026.02
31.67----
2026.01
31.3643---
2026.01
31.3647.6---
2025.09
31.15-46.59--
2026.02
30----
2026.01
29.3443.5---
2026.01
29.3648.5---
2026.01
28.6213.9---
2026.02
28.33----
2026.01
26434.7---
2026.02
25.1---47.1
2026.01
24.6436.3---
2026.02
24.2---48.2
2026.02
24.2---45.6
2026.02
23.3---47.9
2026.02
22.8---38.6
2026.01
22.6423.4---
2026.02
21.4---39.9
2026.01
20445.1---
2026.02
18.6---26.7
2026.02
17.6---34.1
2026.01
16.6445.7---
2026.01
16.2----
2026.01
14.6441.3---
2026.01
14----
2025.05
---5.5-
2025.05
---13.8-
2025.05
---17.1-
2025.05
---15.7-
2025.05
---18.7-
2025.05
---13.4-
2025.05
---18.3-
2025.05
---1-
2025.05
---33.4-
2025.05
---19.1-
2025.05
---31.5-
2025.06
---6.2-
2025.06
---13.3-
2025.06
---11.2-
2025.06
---16.8-
2025.06
---39.5-
2025.06
---40.8-