Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning on AIME 25 (Acc, Tok, CR)

92.6Accuracy

Vanilla

-3.49621.45246.471.348Mar 13, 2026Mar 18, 2026Mar 23, 2026Mar 28, 2026Apr 2, 2026Apr 7, 2026Apr 13, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.03
92.622,124100
2026.03
88.123,694100
2026.03
87.615,40766.3
2026.03
83.720,35891.2
2026.03
83.217,92070.6
2026.03
79.914,255100
2026.03
74.414,499100
2026.03
74.210,78771
2026.03
69.410,97070.7
2026.03
67.117,48177
2026.03
65.811,01481.5
2026.03
65.815,89868.7
2026.03
63.39,42971.3
2026.03
61.87,93752.8
2026.03
59.417,76379.8
2026.03
57.415,23967.1
2026.03
55.920,04984.9
2026.03
487,47948.8
2026.03
43.67,71136.5
2026.03
27.610,49771
2026.03
26.32,47219.9
2026.04
26.1--
2026.04
25.4--
2026.03
23.6--
2026.03
23.1--
2026.03
23--
2026.03
22.8--
2026.03
22.8--
2026.03
22.5--
2026.04
22.3--
2026.03
222,35518.6
2026.03
22--
2026.03
21.7--
2026.03
21.5--
2026.03
21.410,34967.8
2026.03
21.4--
2026.03
21.3--
2026.03
20.52,41313.8
2026.03
19.1--
2026.04
15--
2026.04
13.2--
2026.04
9.5--
2026.03
8.2--
2026.04
4.4--
2026.04
0.2--