Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Tool-Integrated Reasoning on AIME 24 (test accuracy)
Loading...
43.85
Test Accuracy
Seq-ALP
35.0828
37.3589
39.635
41.9111
Mar 19, 2026
Test Accuracy
Updated 26d ago
Evaluation Results
Method
Method
Links
Test Accuracy
Seq-ALP
temperature=1.0, sampl...
2026.03
43.85
Token-MIS
temperature=1.0, sampl...
2026.03
39.48
Seq-MIS
temperature=1.0, sampl...
2026.03
39.48
Token-ALP
temperature=1.0, sampl...
2026.03
38.65
Seq-Bypass
temperature=1.0, sampl...
2026.03
37.19
GRPO
temperature=1.0, sampl...
2026.03
35.42
Feedback
Search any
task
Search any
task