AutoTool: Dynamic Tool Selection and Integration for Agentic Reasoning

About

Agentic reinforcement learning has advanced large language models (LLMs) to reason through long chain-of-thought trajectories while interleaving external tool use. Existing approaches assume a fixed inventory of tools, limiting LLM agents' adaptability to new or evolving toolsets. We present AutoTool, a framework that equips LLM agents with dynamic tool-selection capabilities throughout their reasoning trajectories. We first construct a 200k dataset with explicit tool-selection rationales across 1,000+ tools and 100+ tasks spanning mathematics, science, code generation, and multimodal reasoning. Building on this data foundation, AutoTool employs a dual-phase optimization pipeline: (i) supervised and RL-based trajectory stabilization for coherent reasoning, and (ii) KL-regularized Plackett-Luce ranking to refine consistent multi-step tool selection. Across ten diverse benchmarks, we train two base models, Qwen3-8B and Qwen2.5-VL-7B, with AutoTool. With fewer parameters, AutoTool consistently outperforms advanced LLM agents and tool-integration methods, yielding average gains of 6.4% in math & science reasoning, 4.5% in search-based QA, 7.7% in code generation, and 6.9% in multimodal understanding. In addition, AutoTool exhibits stronger generalization by dynamically leveraging unseen tools from evolving toolsets during inference.

Jiaru Zou, Ling Yang, Yunzhe Qi, Sirui Chen, Mengting Ai, Ke Shen, Jingrui He, Mengdi Wang• 2025

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	AIME 25	Accuracy51.2	201
Mathematical Reasoning	AIME 24	Accuracy68.8	113
Reasoning	HotpotQA	ACC145.1	25
Knowledge-intensive reasoning	2WikiMultihopQA	Accuracy48.8	18
Multimodal Code Generation	V-Code	Accuracy56.1	5
Multimodal Math Reasoning	V-Math	Accuracy53	5
Multimodal Chart Reasoning	V-Chart	Accuracy24.7	5

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord