Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Prompt Engineering a Prompt Engineer

About

Prompt engineering is a challenging yet crucial task for optimizing the performance of large language models on customized tasks. It requires complex reasoning to examine the model's errors, hypothesize what is missing or misleading in the current prompt, and communicate the task with clarity. While recent works indicate that large language models can be meta-prompted to perform automatic prompt engineering, we argue that their potential is limited due to insufficient guidance for complex reasoning in the meta-prompt. We fill this gap by infusing into the meta-prompt three key components: detailed descriptions, context specification, and a step-by-step reasoning template. The resulting method, named PE2, exhibits remarkable versatility across diverse language tasks. It finds prompts that outperform "let's think step by step" by 6.3% on MultiArith and 3.1% on GSM8K, and outperforms competitive baselines on counterfactual tasks by 6.9%. Further, we show that PE2 can make targeted and highly specific prompt edits, rectify erroneous prompts, and induce multi-step plans for complex tasks.

Qinyuan Ye, Maxamed Axmed, Reid Pryzant, Fereshte Khani• 2023

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningGSM8K (test)
Accuracy50.5
770
ReasoningBBH
Accuracy75.53
672
Medical Visual Question AnsweringSlake
Accuracy35.8
239
Math ReasoningGSM8K (test)
Accuracy64
192
Mathematical ReasoningAQUA-RAT
Accuracy88.23
120
Arithmetic ReasoningMultiArith (test)
Accuracy92.3
67
ReasoningBIG-Bench Hard (BBH) (test)
Average Accuracy63.09
56
Emotion DetectionDailyDialog (test)
Micro-F10.3184
53
Counterfactual reasoningCounterfactual Eval (dev)
Mean Score63.4
52
Video ClassificationDrive&Act
Accuracy50.8
36
Showing 10 of 40 rows

Other info

Follow for update