PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization
About
Highly effective, task-specific prompts are often heavily engineered by experts to integrate detailed instructions and domain insights based on a deep understanding of both instincts of large language models (LLMs) and the intricacies of the target task. However, automating the generation of such expert-level prompts remains elusive. Existing prompt optimization methods tend to overlook the depth of domain knowledge and struggle to efficiently explore the vast space of expert-level prompts. Addressing this, we present PromptAgent, an optimization method that autonomously crafts prompts equivalent in quality to those handcrafted by experts. At its core, PromptAgent views prompt optimization as a strategic planning problem and employs a principled planning algorithm, rooted in Monte Carlo tree search, to strategically navigate the expert-level prompt space. Inspired by human-like trial-and-error exploration, PromptAgent induces precise expert-level insights and in-depth instructions by reflecting on model errors and generating constructive error feedback. Such a novel framework allows the agent to iteratively examine intermediate prompts (states), refine them based on error feedbacks (actions), simulate future rewards, and search for high-reward paths leading to expert prompts. We apply PromptAgent to 12 tasks spanning three practical domains: BIG-Bench Hard (BBH), as well as domain-specific and general NLP tasks, showing it significantly outperforms strong Chain-of-Thought and recent prompt optimization baselines. Extensive analyses emphasize its capability to craft expert-level, detailed, and domain-insightful prompts with great efficiency and generalizability.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Arithmetic Reasoning | MultiArith | Accuracy98.33 | 293 | |
| Text Classification | TREC | Accuracy70.2 | 281 | |
| Arithmetic Reasoning | GSM8K | Accuracy76.5 | 272 | |
| Mathematical Reasoning | GSM8K | GSM8K Accuracy (%)94.2 | 204 | |
| Math Reasoning | AQUA | Accuracy92.24 | 188 | |
| Text Classification | MR | Accuracy88.62 | 174 | |
| Arithmetic Reasoning | ADDSUB | Accuracy83.5 | 149 | |
| Text Classification | SST-2 | Accuracy94.58 | 136 | |
| Medical Question Answering | MedQA | Accuracy51.87 | 124 | |
| Text Classification | SST-5 | Accuracy51.74 | 119 |