Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-Turn Tool-Integrated Reasoning (TIR) on MATH500
Loading...
84.69
Peak avg@32 Score
OTB
75.33
77.76
80.19
82.62
Feb 6, 2026
Peak avg@32 Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Peak avg@32 Score
OTB
Backbone=Qwen2.5-7B, R...
2026.02
84.69
SimpleTIR
Backbone=Qwen2.5-7B, R...
2026.02
82.25
OPO
Backbone=Qwen2.5-7B, R...
2026.02
76.94
GRPO
Backbone=Qwen2.5-7B, R...
2026.02
76.88
RLOO
Backbone=Qwen2.5-7B, R...
2026.02
76.06
OGB
Backbone=Qwen2.5-7B, R...
2026.02
75.69
Feedback
Search any
task
Search any
task