Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning on AIME 2024

94Accuracy

GPT-5-Mini-R

49.2860.8972.584.11Jan 31, 2025Apr 9, 2025Jun 16, 2025Aug 24, 2025Oct 31, 2025Jan 7, 2026Mar 17, 2026
Updated 24d ago

Evaluation Results

MethodLinks
94--
2026.02
93.5--
2026.03
93.33--
2026.03
93--
2026.02
92.1--
2026.03
90.16--
2026.03
90--
2025.09
88.7--
2026.02
87.9--
2026.02
87.7--
2026.03
86.67--
2026.03
86.67--
2026.02
86.3--
2026.02
85.7--
2026.02
85--
2025.01
83.8--
2025.12
83.8--
2025.10
82.92--
2025.12
81.4--
2025.10
80.42--
2025.07
80.1--
2025.10
80--
2025.01
79.8--
2025.12
79.8--
2025.07
79.6--
2025.12
79.58--
2025.10
79.17--
2025.07
78.8--
2025.07
78.3--
2025.07
77.5--
2025.07
76.6--
2025.12
76.5--
2025.12
76.5--
2025.12
76.2--
2025.07
76--
2025.10
75.83--
2025.01
75.8--
2025.12
75.33--
2025.12
75--
2025.12
75--
2025.10
74.58--
2025.12
73.54--
2025.12
73.33--
2025.12
73.33--
2025.07
73.3--
2025.07
72.8--
2026.03
72.6--
2025.12
71.5--
2025.07
71.4--
2025.07
70.6--
2025.07
70.2--
2025.10
70--
2025.07
69.7--
2025.07
69.1--
2025.10
67.92--
2025.12
66.67--
2025.07
63.2--
2026.03
62.71--
2025.07
62.7--
2025.12
62.1--
2025.12
62--
2025.01
61.7--
2025.12
61.7--
2025.12
61.46--
2025.07
58.9--
2025.07
58.6--
2025.01
58.3--
2025.12
58.3--
2025.07
58.1--
2025.09
57.58--
2025.01
57.1--
2025.12
57.1--
2025.07
56.8--
2025.01
56.7--
2025.01
56.7--
2025.01
56.7--
2025.01
56.7--
2025.12
56.7--
2025.12
56.7--
2025.07
56.7--
2026.01
56.6--
2025.01
56.3--
2025.01
56.3--
2025.07
56.3--
2025.07
55.9--
2025.07
55.5--
2025.12
55.3--
2025.09
55.1--
2025.12
54.5--
2026.01
54.5--
2025.07
54--
2025.07
53.4--
2025.01
53.3--
2025.01
53.3--
2026.03
53-55
2026.01
52.9--
2026.01
52.4--
2026.01
51.6--
2025.09
51.1--
2026.03
51-52
Showing 100 of 380 rows