Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-Turn Tool-Integrated Reasoning (TIR) on AIME24 (Peak avg@32 score)
Loading...
41.46
Peak avg@32 score
OTB
24.3416
28.7858
33.23
37.6742
Feb 6, 2026
Peak avg@32 score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Peak avg@32 score
OTB
Backbone=Qwen2.5-7B, R...
2026.02
41.46
SimpleTIR
Backbone=Qwen2.5-7B, R...
2026.02
37.91
OGB
Backbone=Qwen2.5-7B, R...
2026.02
30.63
OPO
Backbone=Qwen2.5-7B, R...
2026.02
29.9
GRPO
Backbone=Qwen2.5-7B, R...
2026.02
27.6
RLOO
Backbone=Qwen2.5-7B, R...
2026.02
25
Feedback
Search any
task
Search any
task