Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Chain-of-Thought Reasoning on AIME 25

81.04Accuracy

Vanilla

35.352847.213959.07570.9361Jun 1, 2026
Updated 1d ago

Evaluation Results

MethodLinks
2026.06
81.0422,613-
2026.06
78.1320,6638.6
2026.06
76.221,0566.9
2026.06
7420,7028.4
2026.06
71.6716,727-
2026.06
68.9614,65312.4
2026.06
68.8915,4387.7
2026.06
68.2714,30514.5
2026.06
41.0414,556-
2026.06
39.1711,74019.3
2026.06
37.512,14816.5
2026.06
37.1112,82211.9