Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning on AIME 2025 (Acc, ARI, ABO, AIRW)

80.6Accuracy

Standard CoT (RL Final)

36.60848.02959.4570.871May 5, 2026
Updated 28d ago

Evaluation Results

MethodLinks
2026.05
80.616,99516,70916,709
2026.05
8017,98117,8188,519
2026.05
79.216,37516,32213,829
2026.05
76.917,28716,95816,958
2026.05
7614,13214,0438,619
2026.05
7416,94216,94216,942
2026.05
73.821,58021,31621,316
2026.05
69.414,74414,9816,641
2026.05
66.317,86217,33917,339
2026.05
62.920,80720,54220,542
2026.05
50.87,5408,8193,522
2026.05
38.38,0029,6194,035