Large Language Models as Optimizers

About

Optimization is ubiquitous. While derivative-based algorithms have been powerful tools for various problems, the absence of gradient imposes challenges on many real-world applications. In this work, we propose Optimization by PROmpting (OPRO), a simple and effective approach to leverage large language models (LLMs) as optimizers, where the optimization task is described in natural language. In each optimization step, the LLM generates new solutions from the prompt that contains previously generated solutions with their values, then the new solutions are evaluated and added to the prompt for the next optimization step. We first showcase OPRO on linear regression and traveling salesman problems, then move on to our main application in prompt optimization, where the goal is to find instructions that maximize the task accuracy. With a variety of LLMs, we demonstrate that the best prompts optimized by OPRO outperform human-designed prompts by up to 8% on GSM8K, and by up to 50% on Big-Bench Hard tasks. Code at https://github.com/google-deepmind/opro.

Chengrun Yang, Xuezhi Wang, Yifeng Lu, Hanxiao Liu, Quoc V. Le, Denny Zhou, Xinyun Chen• 2023

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	GSM8K	Accuracy89.16	1398
Mathematical Reasoning	GSM8K (test)	Accuracy89.6	954
Multi-task Language Understanding	MMLU	--	881
Mathematical Reasoning	GSM8K (test)	Accuracy72.8	816
Reasoning	BBH	Accuracy74.12	726
Multi-task Language Understanding	MMLU	MMLU Accuracy56.8	442
Mathematical Reasoning	SVAMP	Accuracy86.33	403
Commonsense Reasoning	CSQA	Accuracy67.73	366
Mathematical Reasoning	GSM8K	Accuracy (GSM8K)33.66	358
Multi-hop Question Answering	HotpotQA (test)	F125.55	311

Showing 10 of 198 rows

...

Other info

Follow for update

@wizwand_team Discord