PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization

About

Highly effective, task-specific prompts are often heavily engineered by experts to integrate detailed instructions and domain insights based on a deep understanding of both instincts of large language models (LLMs) and the intricacies of the target task. However, automating the generation of such expert-level prompts remains elusive. Existing prompt optimization methods tend to overlook the depth of domain knowledge and struggle to efficiently explore the vast space of expert-level prompts. Addressing this, we present PromptAgent, an optimization method that autonomously crafts prompts equivalent in quality to those handcrafted by experts. At its core, PromptAgent views prompt optimization as a strategic planning problem and employs a principled planning algorithm, rooted in Monte Carlo tree search, to strategically navigate the expert-level prompt space. Inspired by human-like trial-and-error exploration, PromptAgent induces precise expert-level insights and in-depth instructions by reflecting on model errors and generating constructive error feedback. Such a novel framework allows the agent to iteratively examine intermediate prompts (states), refine them based on error feedbacks (actions), simulate future rewards, and search for high-reward paths leading to expert prompts. We apply PromptAgent to 12 tasks spanning three practical domains: BIG-Bench Hard (BBH), as well as domain-specific and general NLP tasks, showing it significantly outperforms strong Chain-of-Thought and recent prompt optimization baselines. Extensive analyses emphasize its capability to craft expert-level, detailed, and domain-insightful prompts with great efficiency and generalizability.

Xinyuan Wang, Chenxi Li, Zhen Wang, Fan Bai, Haotian Luo, Jiayou Zhang, Nebojsa Jojic, Eric P. Xing, Zhiting Hu• 2023

Related benchmarks

Task	Dataset	Result
Arithmetic Reasoning	MultiArith	Accuracy98.33	293
Text Classification	TREC	Accuracy70.2	281
Arithmetic Reasoning	GSM8K	Accuracy76.5	272
Mathematical Reasoning	GSM8K	GSM8K Accuracy (%)94.2	204
Math Reasoning	AQUA	Accuracy92.24	188
Text Classification	MR	Accuracy88.62	174
Arithmetic Reasoning	ADDSUB	Accuracy83.5	149
Text Classification	SST-2	Accuracy94.58	136
Medical Question Answering	MedQA	Accuracy51.87	124
Text Classification	SST-5	Accuracy51.74	119

Showing 10 of 52 rows

Other info

Follow for update

@wizwand_team Discord