Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Beyond Single-Shot: Multi-step Tool Retrieval via Query Planning

About

LLM agents operating over massive, dynamic tool libraries rely on effective retrieval, yet standard single-shot dense retrievers struggle with complex requests. These failures primarily stem from the disconnect between abstract user goals and technical documentation, and the limited capacity of fixed-size embeddings to model combinatorial tool compositions. To address these challenges, we propose TOOLQP, a lightweight framework that models retrieval as iterative query planning. Instead of single-shot matching, TOOLQP decomposes instructions into sub-tasks and dynamically generates queries to interact with the retriever, effectively bridging the semantic gap by targeting the specific sub-tasks required for composition. We train TOOLQP using synthetic query trajectories followed by optimization via Reinforcement Learning with Verifiable Rewards (RLVR). Experiments demonstrate that TOOLQP achieves state-of-the-art performance, exhibiting superior zero-shot generalization, robustness across diverse retrievers, and significant improvements in downstream agentic execution.

Wei Fang, James Glass• 2026

Related benchmarks

TaskDatasetResultRank
Tool CallingAPI-Bank L-1--
46
Tool CallingAPI-Bank L-2--
25
Tool RetrievalTOOLRET In-Domain (Avg)
nDCG@1063.1
15
Tool RetrievalTOOLRET Zero-Shot Code
nDCG@1032
15
Tool RetrievalTOOLRET Zero-Shot Custom
nDCG@1045.8
15
Tool RetrievalTOOLRET Zero-Shot Macro-Avg
nDCG@1036.9
15
Tool RetrievalTOOLRET Zero-Shot Web*
nDCG@1033
15
Tool CallingToolBench generalization dataset (I2-Cat)--
7
Tool CallingStableToolBench (STB) I3-Inst
Solvable Pass Rate48.3
6
Showing 9 of 9 rows

Other info

Follow for update