Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Prompt Engineering a Prompt Engineer

About

Prompt engineering is a challenging yet crucial task for optimizing the performance of large language models on customized tasks. It requires complex reasoning to examine the model's errors, hypothesize what is missing or misleading in the current prompt, and communicate the task with clarity. While recent works indicate that large language models can be meta-prompted to perform automatic prompt engineering, we argue that their potential is limited due to insufficient guidance for complex reasoning in the meta-prompt. We fill this gap by infusing into the meta-prompt three key components: detailed descriptions, context specification, and a step-by-step reasoning template. The resulting method, named PE2, exhibits remarkable versatility across diverse language tasks. It finds prompts that outperform "let's think step by step" by 6.3% on MultiArith and 3.1% on GSM8K, and outperforms competitive baselines on counterfactual tasks by 6.9%. Further, we show that PE2 can make targeted and highly specific prompt edits, rectify erroneous prompts, and induce multi-step plans for complex tasks.

Qinyuan Ye, Maxamed Axmed, Reid Pryzant, Fereshte Khani• 2023

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningGSM8K (test)
Accuracy50.5
751
Math ReasoningGSM8K (test)
Accuracy64
155
Medical Visual Question AnsweringSlake
Accuracy35.8
134
Arithmetic ReasoningMultiArith (test)
Accuracy92.3
67
Emotion DetectionDailyDialog (test)
Micro-F10.3184
53
Counterfactual reasoningCounterfactual Eval (dev)
Mean Score63.4
52
Video ClassificationDrive&Act
Accuracy50.8
36
ReasoningBIG-Bench Hard (BBH) (test)
Average Accuracy63.09
28
Fine-grained Image ClassificationCUB
Top-1 Acc71.6
22
Date UnderstandingBIG-bench Hard Date Understanding (test)
Test Accuracy56
14
Showing 10 of 36 rows

Other info

Follow for update