AdaPlanner: Adaptive Planning from Feedback with Language Models
About
Large language models (LLMs) have recently demonstrated the potential in acting as autonomous agents for sequential decision-making tasks. However, most existing methods either take actions greedily without planning or rely on static plans that are not adaptable to environmental feedback. Consequently, the sequential decision-making performance of LLM agents degenerates with problem complexity and plan horizons increase. We propose a closed-loop approach, AdaPlanner, which allows the LLM agent to refine its self-generated plan adaptively in response to environmental feedback. In AdaPlanner, the LLM agent adaptively refines its plan from feedback with both in-plan and out-of-plan refinement strategies. To mitigate hallucination, we develop a code-style LLM prompt structure that facilitates plan generation across a variety of tasks, environments, and agent capabilities. Furthermore, we propose a skill discovery mechanism that leverages successful plans as few-shot exemplars, enabling the agent to plan and refine with fewer task demonstrations. Our experiments in the ALFWorld and MiniWoB++ environments demonstrate that AdaPlanner outperforms state-of-the-art baselines by 3.73% and 4.11% while utilizing 2x and 600x fewer samples, respectively.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Tool Learning | RestBench TMDB | Success Rate66.2 | 32 | |
| Function Calling | BFCL Multi-turn | Accuracy41.1 | 22 | |
| LLM Agent Evaluation | Tau-bench retail | Pass@162.2 | 22 | |
| Stateful Agent-User Interaction | Tau-bench airline | Pass@138.8 | 22 | |
| Sequential Tool Use | RestBench Spotify | Success Rate70.1 | 22 | |
| Tool-use API Generalization | ToolBench G1 v1 | Pass Rate62.2 | 22 | |
| Tool-use API Generalization | ToolBench G2 | Pass Rate58.5 | 22 | |
| Tool-use API Generalization | ToolBench (G3) | Pass Rate48.8 | 22 | |
| Function Calling | BFCL Single-Turn | Accuracy69.2 | 22 | |
| Interactive environment task success | ALFWorld (test) | Overall Success Rate91.79 | 20 |