Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution

About

Popular prompt strategies like Chain-of-Thought Prompting can dramatically improve the reasoning abilities of Large Language Models (LLMs) in various domains. However, such hand-crafted prompt-strategies are often sub-optimal. In this paper, we present Promptbreeder, a general-purpose self-referential self-improvement mechanism that evolves and adapts prompts for a given domain. Driven by an LLM, Promptbreeder mutates a population of task-prompts, and subsequently evaluates them for fitness on a training set. Crucially, the mutation of these task-prompts is governed by mutation-prompts that the LLM generates and improves throughout evolution in a self-referential way. That is, Promptbreeder is not just improving task-prompts, but it is also improving the mutationprompts that improve these task-prompts. Promptbreeder outperforms state-of-the-art prompt strategies such as Chain-of-Thought and Plan-and-Solve Prompting on commonly used arithmetic and commonsense reasoning benchmarks. Furthermore, Promptbreeder is able to evolve intricate task-prompts for the challenging problem of hate speech classification.

Chrisantha Fernando, Dylan Banarse, Henryk Michalewski, Simon Osindero, Tim Rockt\"aschel• 2023

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	GSM8K (test)	Accuracy91.97	954
Code Generation	HumanEval (test)	--	612
Multi-task Language Understanding	MMLU	MMLU Accuracy50.8	442
Code Generation	MBPP (test)	--	405
Mathematical Reasoning	AGIEval MATH	Accuracy45.9	99
Question Answering	GPQA (test)	Accuracy40.9	65
Mathematical Problem Solving	Gaokao MathQA	Accuracy76.6	60
Logic reasoning	Tracking Shuffled Objects BBH	Accuracy16.3	59
Tool Learning	RestBench TMDB	Success Rate74.1	50
Causal Reasoning	BBH Causal Judgement	Accuracy (BBH Causal Judgement)55.8	40

Showing 10 of 59 rows

Other info

Follow for update

@wizwand_team Discord