Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution

About

Popular prompt strategies like Chain-of-Thought Prompting can dramatically improve the reasoning abilities of Large Language Models (LLMs) in various domains. However, such hand-crafted prompt-strategies are often sub-optimal. In this paper, we present Promptbreeder, a general-purpose self-referential self-improvement mechanism that evolves and adapts prompts for a given domain. Driven by an LLM, Promptbreeder mutates a population of task-prompts, and subsequently evaluates them for fitness on a training set. Crucially, the mutation of these task-prompts is governed by mutation-prompts that the LLM generates and improves throughout evolution in a self-referential way. That is, Promptbreeder is not just improving task-prompts, but it is also improving the mutationprompts that improve these task-prompts. Promptbreeder outperforms state-of-the-art prompt strategies such as Chain-of-Thought and Plan-and-Solve Prompting on commonly used arithmetic and commonsense reasoning benchmarks. Furthermore, Promptbreeder is able to evolve intricate task-prompts for the challenging problem of hate speech classification.

Chrisantha Fernando, Dylan Banarse, Henryk Michalewski, Simon Osindero, Tim Rockt\"aschel• 2023

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningGSM8K (test)
Accuracy91.97
954
Code GenerationHumanEval (test)--
612
Multi-task Language UnderstandingMMLU
MMLU Accuracy50.8
442
Code GenerationMBPP (test)--
405
Mathematical ReasoningAGIEval MATH
Accuracy45.9
99
Question AnsweringGPQA (test)
Accuracy40.9
65
Mathematical Problem SolvingGaokao MathQA
Accuracy76.6
60
Logic reasoningTracking Shuffled Objects BBH
Accuracy16.3
59
Tool LearningRestBench TMDB
Success Rate74.1
50
Causal ReasoningBBH Causal Judgement
Accuracy (BBH Causal Judgement)55.8
40
Showing 10 of 59 rows

Other info

Follow for update