Beyond Single-Shot: Multi-step Tool Retrieval via Query Planning

About

LLM agents operating over massive, dynamic tool libraries rely on effective retrieval, yet standard single-shot dense retrievers struggle with complex requests. These failures primarily stem from the disconnect between abstract user goals and technical documentation, and the limited capacity of fixed-size embeddings to model combinatorial tool compositions. To address these challenges, we propose TOOLQP, a lightweight framework that models retrieval as iterative query planning. Instead of single-shot matching, TOOLQP decomposes instructions into sub-tasks and dynamically generates queries to interact with the retriever, effectively bridging the semantic gap by targeting the specific sub-tasks required for composition. We train TOOLQP using synthetic query trajectories followed by optimization via Reinforcement Learning with Verifiable Rewards (RLVR). Experiments demonstrate that TOOLQP achieves state-of-the-art performance, exhibiting superior zero-shot generalization, robustness across diverse retrievers, and significant improvements in downstream agentic execution.

Wei Fang, James Glass• 2026

Related benchmarks

Task	Dataset	Result
Tool Calling	API-Bank L-1	--	46
Tool Calling	API-Bank L-2	--	25
Tool Retrieval	TOOLRET In-Domain (Avg)	nDCG@1063.1	15
Tool Retrieval	TOOLRET Zero-Shot Code	nDCG@1032	15
Tool Retrieval	TOOLRET Zero-Shot Custom	nDCG@1045.8	15
Tool Retrieval	TOOLRET Zero-Shot Macro-Avg	nDCG@1036.9	15
Tool Retrieval	TOOLRET Zero-Shot Web*	nDCG@1033	15
Tool Calling	ToolBench generalization dataset (I2-Cat)	--	7
Tool Calling	StableToolBench (STB) I3-Inst	Solvable Pass Rate48.3	6

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord