DeepTool: Scaling Interleaved Deliberation in Tool-Integrated Reasoning via Process-Supervised Reinforcement Learning

About

Tool-Integrated Reasoning (TIR) extends LLM capabilities by leveraging external environments. However, existing methods lack the deliberation during sequential tool invocation required for strategic planning and self-correction. While RL mitigates this, conventional approaches for Tool-Integrated Reasoning are hindered by sparse outcome-based rewards, failing to supervise intermediate reasoning steps and tool invocations. To address this, we propose DeepTool, a novel framework that scales deliberate thinking within the interleaved process of thinking, action, and observation at each turn. In DeepTool, we first introduce a synthesis pipeline that evolves extended thinking into interleaved trajectories, integrating adversarial perturbations to ensure robustness and self-correction. Secondly, we devise Process-Supervised Reinforcement Learning based on GRPO, which utilizes an Action-Centric Process Reward to reinforce intermediate interleaved thinking and enforce precise tool invocation at every turn. Extensive experiments demonstrate that DeepTool achieves superior performance, boosting Qwen2.5-7B significantly across six benchmarks (e.g., AIME24: 3.2% -> 40.4% and HMMT25: 0.0% -> 28.6%). Furthermore, the token cost-effectiveness analysis confirms the utility of interleaved thinking, demonstrating DeepTool's optimal balance between performance and token efficiency.

Yang He, Xiao Ding, Bibo Cai, Yufei Zhang, Kai Xiong, Zhouhao Sun, Bing Qin, Ting Liu• 2026

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	MATH 500	--	274
Mathematical Reasoning	AIME 2024	Mean Score (k=8)40.4	81
Mathematical Reasoning	AMC 23	Avg@875.3	60
Mathematical Reasoning	HMMT Feb 2025	--	54
Mathematical Reasoning	AIME 2025	Average@8 Score35	15
Mathematical Reasoning	OlympiadBench	Average@8 Score49.8	10
Mathematical Reasoning	GPQA Diamond	Average@8 Score45.3	10

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord