Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

AutoTool: Dynamic Tool Selection and Integration for Agentic Reasoning

About

Agentic reinforcement learning has advanced large language models (LLMs) to reason through long chain-of-thought trajectories while interleaving external tool use. Existing approaches assume a fixed inventory of tools, limiting LLM agents' adaptability to new or evolving toolsets. We present AutoTool, a framework that equips LLM agents with dynamic tool-selection capabilities throughout their reasoning trajectories. We first construct a 200k dataset with explicit tool-selection rationales across 1,000+ tools and 100+ tasks spanning mathematics, science, code generation, and multimodal reasoning. Building on this data foundation, AutoTool employs a dual-phase optimization pipeline: (i) supervised and RL-based trajectory stabilization for coherent reasoning, and (ii) KL-regularized Plackett-Luce ranking to refine consistent multi-step tool selection. Across ten diverse benchmarks, we train two base models, Qwen3-8B and Qwen2.5-VL-7B, with AutoTool. With fewer parameters, AutoTool consistently outperforms advanced LLM agents and tool-integration methods, yielding average gains of 6.4% in math & science reasoning, 4.5% in search-based QA, 7.7% in code generation, and 6.9% in multimodal understanding. In addition, AutoTool exhibits stronger generalization by dynamically leveraging unseen tools from evolving toolsets during inference.

Jiaru Zou, Ling Yang, Yunzhe Qi, Sirui Chen, Mengting Ai, Ke Shen, Jingrui He, Mengdi Wang• 2025

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningAIME 25
Accuracy51.2
201
Mathematical ReasoningAIME 24
Accuracy68.8
113
ReasoningHotpotQA
ACC145.1
25
Knowledge-intensive reasoning2WikiMultihopQA
Accuracy48.8
18
Multimodal Code GenerationV-Code
Accuracy56.1
5
Multimodal Math ReasoningV-Math
Accuracy53
5
Multimodal Chart ReasoningV-Chart
Accuracy24.7
5
Showing 7 of 7 rows

Other info

Follow for update