Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution

About

Popular prompt strategies like Chain-of-Thought Prompting can dramatically improve the reasoning abilities of Large Language Models (LLMs) in various domains. However, such hand-crafted prompt-strategies are often sub-optimal. In this paper, we present Promptbreeder, a general-purpose self-referential self-improvement mechanism that evolves and adapts prompts for a given domain. Driven by an LLM, Promptbreeder mutates a population of task-prompts, and subsequently evaluates them for fitness on a training set. Crucially, the mutation of these task-prompts is governed by mutation-prompts that the LLM generates and improves throughout evolution in a self-referential way. That is, Promptbreeder is not just improving task-prompts, but it is also improving the mutationprompts that improve these task-prompts. Promptbreeder outperforms state-of-the-art prompt strategies such as Chain-of-Thought and Plan-and-Solve Prompting on commonly used arithmetic and commonsense reasoning benchmarks. Furthermore, Promptbreeder is able to evolve intricate task-prompts for the challenging problem of hate speech classification.

Chrisantha Fernando, Dylan Banarse, Henryk Michalewski, Simon Osindero, Tim Rockt\"aschel• 2023

Related benchmarks

TaskDatasetResultRank
Question AnsweringGPQA (test)
Accuracy40.9
55
Structured JSON GenerationMultiWOZ, Super-NaturalInstructions, TruthfulQA, and Self-Instruct Averaged
Similarity Score0.8
16
Navigation ReasoningBBH-Navigate (test)
Accuracy96.3
11
Mathematical ReasoningAGIEval-MATH (test)
Accuracy45.9
11
Fact CheckingLIAR (test)
Accuracy63.2
11
Coreference ResolutionWSC (test)
Accuracy76.7
11
Prompt OptimizationDABench
Acc (Easy)80
10
Prompt OptimizationVisEval
Accuracy (Easy)0.76
10
Mathematical ReasoningMATH Levels 3, 4, 5 (test)
Accuracy (Level 3)92
10
Showing 9 of 9 rows

Other info

Follow for update