Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning on AIME '24 (Pass@1, Pass@32)

86.7Pass@1 Accuracy

Untrained

19.51636.95854.471.842Apr 1, 2026Apr 8, 2026Apr 15, 2026Apr 22, 2026Apr 29, 2026May 6, 2026May 14, 2026
Updated 19d ago

Evaluation Results

MethodLinks
2026.05
86.7-
2026.05
86.2-
2026.05
84.8-
2026.05
84.2-
2026.05
83.3-
2026.05
83.3-
2026.05
81.9-
2026.05
79.8-
2026.05
78.3-
2026.05
74-
2026.05
61.9-
2026.05
58.8-
2026.05
57.1-
2026.05
55-
2026.05
50-
2026.04
39.980
2026.04
39.880
2026.04
38.276.7
2026.04
36.873.3
2026.04
35.980
2026.04
33.476.7
2026.04
30.873.3
2026.04
22.166.7