AutoTIR: Autonomous Tools Integrated Reasoning via Reinforcement Learning

About

Large Language Models (LLMs), when enhanced through reasoning-oriented post-training, evolve into powerful Large Reasoning Models (LRMs). Tool-Integrated Reasoning (TIR) further extends their capabilities by incorporating external tools, but existing methods often rely on rigid, predefined tool-use patterns that risk degrading core language competence. Inspired by the human ability to adaptively select tools, we introduce AutoTIR, a reinforcement learning framework that enables LLMs to autonomously decide whether and which tool to invoke during the reasoning process, rather than following static tool-use strategies. AutoTIR leverages a hybrid reward mechanism that jointly optimizes for task-specific answer correctness, structured output adherence, and penalization of incorrect tool usage, thereby encouraging both precise reasoning and efficient tool integration. Extensive evaluations across diverse knowledge-intensive, mathematical, and general language modeling tasks demonstrate that AutoTIR achieves superior overall performance, significantly outperforming baselines and exhibits superior generalization in tool-use behavior. These results highlight the promise of reinforcement learning in building truly generalizable and scalable TIR capabilities in LLMs. The code and data are available at https://github.com/weiyifan1023/AutoTIR.

Yifan Wei, Xiaoyan Yu, Yixuan Weng, Tengfei Pan, Angsheng Li, Li Du• 2025

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	MATH	Accuracy59	535
Mathematical Reasoning	AMC 23	Accuracy35	198
Mathematical Reasoning	AIME24	Accuracy80	160
Mathematical Reasoning	AIME 24	Pass@1 Accuracy33.33	117
Mathematical Reasoning	GSM8K	--	102
Mathematical Reasoning	AIME 24	AIME 24 Accuracy6.67	84
Knowledge-intensive reasoning	HLE	Avg Score85	75
Knowledge-intensive reasoning	MuSiQue	Accuracy85	51
Mathematical Reasoning	AIME25	Accuracy6.67	41
Knowledge-intensive reasoning	2WikiMultihopQA	Accuracy25	38

Showing 10 of 17 rows

Other info

Follow for update

@wizwand_team Discord