Unleashing the Potential of Large Language Models as Prompt Optimizers: Analogical Analysis with Gradient-based Model Optimizers

About

Automatic prompt optimization is an important approach to improving the performance of large language models (LLMs). Recent research demonstrates the potential of using LLMs as prompt optimizers, which can generate improved task prompts via iterative refinement. In this paper, we propose a novel perspective to investigate the design of LLM-based prompt optimizers, by drawing an analogy with gradient-based model optimizers. To connect these two approaches, we identify two pivotal factors in model parameter learning: update direction and update method. By systematically analyzing a rich set of improvement strategies on the two aspects, we further develop a capable Gradient-inspired LLM-based Prompt Optimizer called GPO. At each step, it first retrieves relevant prompts from the optimization trajectory as the update direction. Then, it utilizes the generation-based refinement strategy to perform the update, while controlling the edit distance through a cosine-based decay strategy. Extensive experiments demonstrate the effectiveness and efficiency of GPO. In particular, GPO brings an additional improvement of up to 56.8% on Big-Bench Hard and 62.6% on MMLU compared to baseline methods. The code is available at https://github.com/RUCAIBox/GPO.

Xinyu Tang, Xiaolei Wang, Wayne Xin Zhao, Siyuan Lu, Yaliang Li, Ji-Rong Wen• 2024

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	GSM8K	Accuracy95.04	1424
Text-to-SQL	Text-to-SQL Multi-sharded	Functional Accuracy52.3	35
Task-oriented Dialogue	FnCTOD	Success Rate57.9	25
Fact Verification	LIAR	F1 Score57.85	24
Symbolic and Logical Reasoning	Big-Bench Hard (BBH)	Exact Match Performance82.61	22
Safety Control	DeepSeek-R1-Distill-Qwen-1.5B	P_safeguarded (Safety-Quality Score)84.3	17
Safety Control	BlackSheep Llama3.2-3B	Safety-Quality Score (P_safeguarded)86.9	17
Safety Control	Evil-Alpaca 3B L3.2	Safety-Quality Score (P_safeguarded)89.7	17
Safety Control	Macro Metrics Aggregate across LLMs	Macro-P Safeguarded Safety-Quality Score77.6	17
Safety Control	DialoGPT large	Safety-Quality Score0.497	17

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord