Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Tool-Integrated Reasoning on Overall 9 Benchmarks
Loading...
88
Average Score
AutoTraj
67.2
72.6
78
83.4
Jan 30, 2026
Average Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Average Score
AutoTraj
Category=SFT-RL TIR Me...
2026.01
88
AutoTIR
Category=RL-only TIR M...
2026.01
83
Tool-Star-SFT
Category=SFT-only TIR...
2026.01
82
Tool-Star
Category=SFT-RL TIR Me...
2026.01
79
R1-Searcher
Category=RL-only TIR M...
2026.01
78
Vanilla SFT-RL TIR
Category=SFT-RL TIR Me...
2026.01
78
Qwen2.5-7B-Instruct
Framework=Multi-Dimens...
2026.01
75
ToRL
Category=RL-only TIR M...
2026.01
74
ReSearch
Category=RL-only TIR M...
2026.01
68
Feedback
Search any
task
Search any
task