Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization

About

Highly effective, task-specific prompts are often heavily engineered by experts to integrate detailed instructions and domain insights based on a deep understanding of both instincts of large language models (LLMs) and the intricacies of the target task. However, automating the generation of such expert-level prompts remains elusive. Existing prompt optimization methods tend to overlook the depth of domain knowledge and struggle to efficiently explore the vast space of expert-level prompts. Addressing this, we present PromptAgent, an optimization method that autonomously crafts prompts equivalent in quality to those handcrafted by experts. At its core, PromptAgent views prompt optimization as a strategic planning problem and employs a principled planning algorithm, rooted in Monte Carlo tree search, to strategically navigate the expert-level prompt space. Inspired by human-like trial-and-error exploration, PromptAgent induces precise expert-level insights and in-depth instructions by reflecting on model errors and generating constructive error feedback. Such a novel framework allows the agent to iteratively examine intermediate prompts (states), refine them based on error feedbacks (actions), simulate future rewards, and search for high-reward paths leading to expert prompts. We apply PromptAgent to 12 tasks spanning three practical domains: BIG-Bench Hard (BBH), as well as domain-specific and general NLP tasks, showing it significantly outperforms strong Chain-of-Thought and recent prompt optimization baselines. Extensive analyses emphasize its capability to craft expert-level, detailed, and domain-insightful prompts with great efficiency and generalizability.

Xinyuan Wang, Chenxi Li, Zhen Wang, Fan Bai, Haotian Luo, Jiayou Zhang, Nebojsa Jojic, Eric P. Xing, Zhiting Hu• 2023

Related benchmarks

TaskDatasetResultRank
Question AnsweringGPQA (test)
Accuracy41.3
55
Coreference ResolutionWSC (test)
Accuracy82.7
11
Navigation ReasoningBBH-Navigate (test)
Accuracy95.7
11
Fact CheckingLIAR (test)
Accuracy64.1
11
Mathematical ReasoningAGIEval-MATH (test)
Accuracy41.4
11
Prompt OptimizationVisEval
Accuracy (Easy)0.77
10
Mathematical ReasoningMATH Levels 3, 4, 5 (test)
Accuracy (Level 3)95
10
Prompt OptimizationDABench
Acc (Easy)77
10
Showing 8 of 8 rows

Other info

Follow for update