Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning on AIME 2025 (Pass@1, #Tokens)

76.7Pass@1 Accuracy

REBALANCE

-2.96417.71838.459.082Sep 29, 2025Nov 8, 2025Dec 18, 2025Jan 28, 2026Mar 9, 2026Apr 18, 2026May 29, 2026
Updated 1d ago

Evaluation Results

MethodLinks
2026.03
76.79,417
2026.03
73.314,552
2026.05
53.3-
2026.05
46.7-
2025.10
43.7-
2025.10
40.9-
2025.10
40.4-
2025.10
40-
2026.05
40-
2025.10
38.5-
2025.09
35.42-
2025.09
33.85-
2025.09
32.92-
2025.09
30.52-
2025.10
26.6-
2025.10
26.3-
2025.09
25-
2025.10
24.8-
2025.09
24.79-
2025.10
24.7-
2025.09
24.37-
2025.09
24.06-
2025.09
22.6-
2025.10
22.5-
2026.05
22.5-
2025.10
21.4-
2025.10
20.6-
2026.05
14.6-
2026.05
12.9-
2026.05
12.1-
2025.09
10-
2025.09
8.85-
2025.09
6.35-
2026.05
3.5-
2026.05
0.1-