Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning on AIME 25 (Acc., #Tokens)

61.46Accuracy (AIME 25)

Per-Step Scale

7.504821.512435.5249.5276Jul 20, 2025
Updated 22d ago

Evaluation Results

MethodLinks
2025.07
61.4617,138
2025.07
61.2517,779
2025.07
60.8318,608
2025.07
60.2116,156
2025.07
59.7918,131
2025.07
59.3817,256
2025.07
58.9618,104
2025.07
55.4220,000
2025.07
31.2516,146
2025.07
30.6316,948
2025.07
29.1717,880
2025.07
29.1717,316
2025.07
18.754,746
2025.07
152,577
2025.07
9.584,273