Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Problem Solving on AIME 2025 (Top-1 Accuracy)

91.67Top-1 Accuracy (%)

Majority Vote

-0.962823.086147.13571.1839Oct 7, 2025Nov 11, 2025Dec 17, 2025Jan 21, 2026Feb 26, 2026Apr 2, 2026May 8, 2026
Updated 23d ago

Evaluation Results

MethodLinks
2026.04
91.67----
2026.04
90----
2026.04
90----
2026.04
80----
2026.04
78.75----
2026.04
78.75----
2026.04
76.67----
2026.04
76.67----
2026.04
76.67----
2026.04
76.67----
2026.04
70----
2026.04
67.08----
2026.05
50---15
2025.10
26.97----
2025.10
25.16----
2025.10
23.13----
2025.10
22.99----
2025.10
22.06----
2025.10
20.8----
2025.10
20.25----
2025.10
19.6----
2025.10
19.22----
2025.10
19.1----
2025.10
19.09----
2025.10
19.03----
2025.10
18.6----
2025.10
18.45----
2025.10
18.4----
2025.10
18.4----
2025.10
18.38----
2025.10
17.83----
2025.10
16.4----
2025.10
15.8----
2025.10
13.55----
2025.10
13.35----
2025.10
12.61----
2025.10
11.54----
2026.05
11.46----
2026.05
11.46----
10.83----
2025.10
9.8----
8.12----
2026.05
7.71----
2026.05
7.08----
2026.05
4.17----
2025.10
2.6----
2025.10
--3.33--
2025.10
-5.133.62.52-
2025.10
-9.457.44.32-
2025.10
-7.324.52.87-
2025.10
-10.827.283.42-
2025.10
--6.66--
2025.10
-17.4115.814.9-
2025.10
-14.1916.216.81-
2025.10
-21.6119.817.61-
2025.10
-28.8325.121.96-