Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Chain-of-Thought Reasoning on AIME 24

81.46Accuracy

Vanilla

45.278454.671764.06573.4583Jun 1, 2026
Updated 1d ago

Evaluation Results

MethodLinks
2026.06
81.4613,872-
2026.06
77.7112,14112.5
2026.06
77.2919,178-
2026.06
76.8818,7142.4
2026.06
76.2212,5089.8
2026.06
7618,9251.3
2026.06
75.5618,7092.4
2026.06
70.4211,45417.4
2026.06
55.6313,313-
2026.06
53.7510,91218
2026.06
52.6711,21515.8
2026.06
46.6710,75619.2