Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Tool-Integrated Reasoning on AIME 25 (test accuracy)

31.25Test Accuracy

Token-ALP

23.772425.713727.65529.5963Mar 19, 2026
Updated 26d ago

Evaluation Results

MethodLinks
2026.03
31.25
2026.03
28.85
2026.03
28.65
2026.03
24.79
2026.03
24.17
2026.03
24.06