Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reasoning on AIME 24 (Token Efficiency Metrics)

79.17Average @k Accuracy

Type IV (STOP)

24.133238.421652.7166.9984Apr 17, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.04
79.17279,000-79.51
2026.04
78.75398,400-70.73
2026.04
78.33284,300-79.11
2026.04
77.5290,400-78.67
2026.04
77.5184,400-68.98
2026.04
76.931,361,000-
2026.04
76.25299,800-49.55
2026.04
75.26594,200-
2026.04
75187,000-68.54
2026.04
73.33196,600-66.91
2026.04
61.67189,000-71.63
2026.04
55197,400-70.38
2026.04
54.69666,200-
2026.04
54.58312,500-53.09
2026.04
53.75202,700-69.58
2026.04
37.92204,300-73.88
2026.04
32.92210,600-73.08
2026.04
32.5325,900-58.34
2026.04
30.1782,300-
2026.04
26.25218,300-72.09