Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution
About
Popular prompt strategies like Chain-of-Thought Prompting can dramatically improve the reasoning abilities of Large Language Models (LLMs) in various domains. However, such hand-crafted prompt-strategies are often sub-optimal. In this paper, we present Promptbreeder, a general-purpose self-referential self-improvement mechanism that evolves and adapts prompts for a given domain. Driven by an LLM, Promptbreeder mutates a population of task-prompts, and subsequently evaluates them for fitness on a training set. Crucially, the mutation of these task-prompts is governed by mutation-prompts that the LLM generates and improves throughout evolution in a self-referential way. That is, Promptbreeder is not just improving task-prompts, but it is also improving the mutationprompts that improve these task-prompts. Promptbreeder outperforms state-of-the-art prompt strategies such as Chain-of-Thought and Plan-and-Solve Prompting on commonly used arithmetic and commonsense reasoning benchmarks. Furthermore, Promptbreeder is able to evolve intricate task-prompts for the challenging problem of hate speech classification.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mathematical Reasoning | GSM8K (test) | Accuracy91.97 | 900 | |
| Code Generation | HumanEval (test) | -- | 506 | |
| Code Generation | MBPP (test) | -- | 298 | |
| Mathematical Problem Solving | Gaokao MathQA | Accuracy76.6 | 60 | |
| Question Answering | GPQA (test) | Accuracy40.9 | 55 | |
| Question Answering | HotpotQA (test) | -- | 37 | |
| Tool Learning | RestBench TMDB | Success Rate74.1 | 32 | |
| Knowledge Intensive | Gaokao History | Accuracy81.5 | 30 | |
| Function Calling | BFCL Single-Turn | Accuracy81.3 | 22 | |
| Function Calling | BFCL Multi-turn | Accuracy36.2 | 22 |