ToolGen: Unified Tool Retrieval and Calling via Generation
About
As large language models (LLMs) advance, their inability to autonomously execute tasks by directly interacting with external tools remains a critical limitation. Traditional methods rely on inputting tool descriptions as context, which is constrained by context length and requires separate, often inefficient, retrieval mechanisms. We introduce ToolGen, a paradigm shift that integrates tool knowledge directly into the LLM's parameters by representing each tool as a unique token. This enables the LLM to generate tool calls and arguments as part of its next token prediction capabilities, seamlessly blending tool invocation with language generation. Our framework allows the LLM to access and utilize a vast amount of tools with no additional retrieval step, significantly enhancing both performance and scalability. Experimental results with over 47,000 tools show that ToolGen not only achieves superior results in both tool retrieval and autonomous task completion but also sets the stage for a new era of AI agents that can adapt to tools across diverse domains. By fundamentally transforming tool retrieval into a generative process, ToolGen paves the way for more versatile, efficient, and autonomous AI systems. ToolGen enables end-to-end tool learning and opens opportunities for integration with other advanced techniques such as chain-of-thought and reinforcement learning, thereby expanding the practical capabilities of LLMs.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Tool Retrieval | ToolBench In-domain I1 | NDCG@191 | 29 | |
| Tool Retrieval | ToolBench In-domain (I2) | NDCG@191.45 | 20 | |
| Tool Retrieval | ToolBench In-domain (I3) | NDCG@187 | 20 | |
| End-to-end Tool-use | ToolBench I1 v1 | SoPR56.13 | 16 | |
| End-to-end Tool-use | ToolBench v1 (I2) | SoPR52.2 | 12 | |
| End-to-end Tool-use | ToolBench I3 v1 | SoPR51.37 | 12 | |
| End-to-end Tool-use | ToolBench I1-Cat v1 | SoPR61.76 | 11 | |
| Tool Retrieval | ToolBench Multi-domain (I2) | NDCG@184 | 9 | |
| Tool Retrieval | ToolBench I3 | NDCG@181 | 9 | |
| Tool Calling | ToolBench generalization dataset (I1-Tool) | SoPR57.7 | 7 |