Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-Turn Tool-Integrated Reasoning (TIR) on AMC23
Loading...
79.45
Peak avg@32 Score
OTB
58.65
64.05
69.45
74.85
Feb 6, 2026
Peak avg@32 Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Peak avg@32 Score
OTB
Backbone=Qwen2.5-7B, R...
2026.02
79.45
SimpleTIR
Backbone=Qwen2.5-7B, R...
2026.02
71.25
OPO
Backbone=Qwen2.5-7B, R...
2026.02
64.84
OGB
Backbone=Qwen2.5-7B, R...
2026.02
62.5
RLOO
Backbone=Qwen2.5-7B, R...
2026.02
62.42
GRPO
Backbone=Qwen2.5-7B, R...
2026.02
59.45
Feedback
Search any
task
Search any
task