Prompt Engineering a Prompt Engineer

About

Prompt engineering is a challenging yet crucial task for optimizing the performance of large language models on customized tasks. It requires complex reasoning to examine the model's errors, hypothesize what is missing or misleading in the current prompt, and communicate the task with clarity. While recent works indicate that large language models can be meta-prompted to perform automatic prompt engineering, we argue that their potential is limited due to insufficient guidance for complex reasoning in the meta-prompt. We fill this gap by infusing into the meta-prompt three key components: detailed descriptions, context specification, and a step-by-step reasoning template. The resulting method, named PE2, exhibits remarkable versatility across diverse language tasks. It finds prompts that outperform "let's think step by step" by 6.3% on MultiArith and 3.1% on GSM8K, and outperforms competitive baselines on counterfactual tasks by 6.9%. Further, we show that PE2 can make targeted and highly specific prompt edits, rectify erroneous prompts, and induce multi-step plans for complex tasks.

Qinyuan Ye, Maxamed Axmed, Reid Pryzant, Fereshte Khani• 2023

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	GSM8K (test)	Accuracy50.5	816
Reasoning	BBH	Accuracy75.53	726
Math Reasoning	GSM8K (test)	Accuracy64	250
Medical Visual Question Answering	Slake	Accuracy35.8	247
Mathematical Reasoning	AQUA-RAT	Accuracy88.23	153
Arithmetic Reasoning	MultiArith (test)	Accuracy92.3	115
Reasoning	BIG-Bench Hard (BBH) (test)	Average Accuracy63.09	62
Emotion Detection	DailyDialog (test)	Micro-F10.3184	53
Counterfactual reasoning	Counterfactual Eval (dev)	Mean Score63.4	52
Fine-grained Image Classification	CUB	Top-1 Acc71.6	45

Showing 10 of 40 rows

Other info

Follow for update

@wizwand_team Discord