Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning on AIME 25 (Pass@1, Token count)

72.2Pass@1 Accuracy

RED

27.16838.85950.5562.241Apr 3, 2026
Updated 12d ago

Evaluation Results

MethodLinks
2026.04
72.25,908
2026.04
71.18,569
2026.04
71.16,389
2026.04
7012,487
2026.04
68.911,589
2026.04
68.96,504
2026.04
66.78,417
2026.04
66.76,854
2026.04
65.66,772
2026.04
63.38,607
2026.04
63.36,203
2026.04
63.35,018
2026.04
62.26,674
2026.04
62.211,098
2026.04
62.26,124
2026.04
62.28,990
2026.04
61.112,490
2026.04
61.17,331
2026.04
61.18,994
2026.04
606,055
2026.04
605,334
2026.04
58.99,271
2026.04
58.97,639
2026.04
58.98,906
2026.04
58.94,711
2026.04
57.88,008
2026.04
54.46,647
2026.04
51.18,715
2026.04
508,689
2026.04
504,303
2026.04
48.97,478
2026.04
48.95,896
2026.04
47.85,690
2026.04
47.88,909
2026.04
46.74,414
2026.04
45.612,006
2026.04
45.65,932
2026.04
45.65,967
2026.04
44.48,921
2026.04
44.45,794
2026.04
43.311,454
2026.04
43.36,255
2026.04
42.28,261
2026.04
41.17,904
2026.04
38.94,760
2026.04
34.49,005
2026.04
34.45,039
2026.04
32.28,438
2026.04
31.18,936
2026.04
31.15,426
2026.04
307,158
2026.04
305,897
2026.04
3011,987
2026.04
28.911,548.2