Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reasoning on AIME 25 (avg@k, Token Efficiency)

75.42Avg@k

Type IV (STOP)

20.653634.871849.0963.3082Apr 17, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.04
75.42191,100-71.62
2026.04
72.92290,900-79.62
2026.04
72.5408,400-71.39
2026.04
70.99673,400-
2026.04
70.83197,700-70.64
2026.04
70.681,427,000-
2026.04
70.42291,200-79.6
2026.04
69.17297,200-79.18
2026.04
69.17205,100-69.54
2026.04
69.17311,700-53.71
2026.04
42.5197,500-71.91
2026.04
41.67202,600-71.18
2026.04
39.67703,000-
2026.04
39.17317,600-54.82
2026.04
35.42207,400-70.5
2026.04
26.67206,600-73.68
2026.04
24.17214,700-72.64
2026.04
24.17325,000-58.59
2026.04
23.75208,700-73.4
2026.04
22.76784,800-