Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning on AIME 2025 (Accuracy and Average Tokens)

96.7Accuracy

PC-cubic

37.73253.04168.3583.659Aug 12, 2025Sep 27, 2025Nov 12, 2025Dec 28, 2025Feb 12, 2026Mar 30, 2026May 16, 2026
Updated 21h ago

Evaluation Results

MethodLinks
2026.05
96.7-
2026.05
94.6-
2026.05
94.6-
2026.05
94.2-
2026.05
94.1-
2026.05
93.5-
2026.05
93.1-
2026.05
92.5-
2026.05
92.3-
2026.05
91.9-
2026.05
91.3-
2026.05
90.4-
2026.05
90.1-
2026.05
88.8-
2026.05
88.3-
88-
2026.05
87.5-
2026.04
86.7-
2026.05
86.67-
2026.05
86.67-
2026.04
83.317,498
2025.08
82.22-
2026.05
8010,575
2026.05
80-
2026.04
73.312,859
2026.04
73.3-
2026.04
73.3-
2026.04
73.3-
2026.05
73.312,687
2026.05
73.310,661
2026.04
7020,773
2026.04
70-
2025.08
70-
2026.05
7015,730
2025.08
68.89-
2025.08
67.04-
2026.04
66.715,407
2026.04
66.711,844
2026.04
66.7-
2025.08
65.56-
2026.04
63.312,321
2026.04
63.318,049
2026.04
63.314,249
2026.04
63.310,783
2026.05
62.1-
2025.08
60.5-
2026.04
60-
2026.04
6017,492
2026.04
6011,932
2026.04
6011,172
2026.04
6013,838
2026.04
6012,814
2026.04
6013,356
2026.04
60-
2025.08
56.8-
2026.04
56.7-
2026.04
53.312,638
2026.04
53.3-
2026.04
53.3-
2026.05
52.5-
2025.08
52.5-
2026.05
52.1-
2025.08
51.11-
2025.08
51.11-
2026.05
50.8-
2026.04
5013,330
2026.04
50-
2026.04
50-
2026.04
50-
2026.05
50-
2026.05
49.9-
2026.04
49.2-
2026.05
49-
2026.05
48.8-
2026.05
47.6-
2026.05
47.3-
2026.04
46.711,804
2026.04
46.712,503
2026.04
46.7-
2026.04
46.67-
2026.04
46.67-
2026.05
46.67-
2026.05
46.67-
2026.05
46.5-
2026.05
44.2-
2026.05
44.2-
2026.04
43.312,994
2026.04
43.3-
2026.04
43.3-
2026.05
42.5-
2025.09
41.71-
2025.08
41.3-
2025.08
41.2-
2026.04
40.8-
2026.04
4015,796
2026.04
408,482
2026.04
4016,427
2026.04
4013,146
2026.04
40-
2026.05
40-
Showing 100 of 311 rows