Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Long-Horizon Plan Execution in Large Tool Spaces through Entropy-Guided Branching

About

Large Language Models (LLMs) have significantly advanced tool-augmented agents, enabling autonomous reasoning via API interactions. However, executing multi-step tasks within massive tool libraries remains challenging due to two critical bottlenecks: (1) the absence of rigorous, plan-level evaluation frameworks and (2) the computational demand of exploring vast decision spaces stemming from large toolsets and long-horizon planning. To bridge these gaps, we first introduce SLATE (Synthetic Large-scale API Toolkit for E-commerce), a large-scale context-aware benchmark designed for the automated assessment of tool-integrated agents. Unlike static metrics, SLATE accommodates diverse yet functionally valid execution trajectories, revealing that current agents struggle with self-correction and search efficiency. Motivated by these findings, we next propose Entropy-Guided Branching (EGB), an uncertainty-aware search algorithm that dynamically expands decision branches where predictive entropy is high. EGB optimizes the exploration-exploitation trade-off, significantly enhancing both task success rates and computational efficiency. Extensive experiments on SLATE demonstrate that our dual contribution provides a robust foundation for developing reliable and scalable LLM agents in tool-rich environments.

Rongzhe Wei, Ge Shi, Min Cheng, Na Zhang, Pan Li, Sarthak Ghosh, Vaibhav Gorde, Leman Akoglu• 2026

Related benchmarks

TaskDatasetResultRank
Tool-use ExecutionUltraTool first 100 queries adapted for execution (test)
Execution Success Rate82
11
Planning and Tool UseSLATE synthetic
Tool Match Rate36.4
5
Tool-use Search EvaluationSLATE synthetic
Tool Match Rate68.5
5
Showing 3 of 3 rows

Other info

Follow for update