Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution
About
Popular prompt strategies like Chain-of-Thought Prompting can dramatically improve the reasoning abilities of Large Language Models (LLMs) in various domains. However, such hand-crafted prompt-strategies are often sub-optimal. In this paper, we present Promptbreeder, a general-purpose self-referential self-improvement mechanism that evolves and adapts prompts for a given domain. Driven by an LLM, Promptbreeder mutates a population of task-prompts, and subsequently evaluates them for fitness on a training set. Crucially, the mutation of these task-prompts is governed by mutation-prompts that the LLM generates and improves throughout evolution in a self-referential way. That is, Promptbreeder is not just improving task-prompts, but it is also improving the mutationprompts that improve these task-prompts. Promptbreeder outperforms state-of-the-art prompt strategies such as Chain-of-Thought and Plan-and-Solve Prompting on commonly used arithmetic and commonsense reasoning benchmarks. Furthermore, Promptbreeder is able to evolve intricate task-prompts for the challenging problem of hate speech classification.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mathematical Reasoning | GSM8K (test) | Accuracy91.97 | 954 | |
| Code Generation | HumanEval (test) | -- | 612 | |
| Multi-task Language Understanding | MMLU | MMLU Accuracy50.8 | 442 | |
| Code Generation | MBPP (test) | -- | 405 | |
| Mathematical Reasoning | AGIEval MATH | Accuracy45.9 | 99 | |
| Question Answering | GPQA (test) | Accuracy40.9 | 65 | |
| Mathematical Problem Solving | Gaokao MathQA | Accuracy76.6 | 60 | |
| Logic reasoning | Tracking Shuffled Objects BBH | Accuracy16.3 | 59 | |
| Tool Learning | RestBench TMDB | Success Rate74.1 | 50 | |
| Causal Reasoning | BBH Causal Judgement | Accuracy (BBH Causal Judgement)55.8 | 40 |