SEE: Strategic Exploration and Exploitation for Cohesive In-Context Prompt Optimization

About

Designing optimal prompts for Large Language Models (LLMs) is a complicated and resource-intensive task, often requiring substantial human expertise and effort. Existing approaches typically separate the optimization of prompt instructions and in-context learning examples, leading to incohesive prompts that are defined and represented by suboptimal task performance. To overcome these challenges, we propose a novel Cohesive In-Context Prompt Optimization framework that refines both prompt instructions and examples. However, formulating such an optimization in the discrete and high-dimensional space of natural language poses significant challenges in both convergence and computational efficiency. To address these issues, we introduce SEE, a scalable and efficient prompt optimization framework that adopts metaheuristic optimization principles and strategically balances exploration and exploitation to enhance optimization performance and achieve efficient convergence. SEE features a quad-phased design that alternates between global traversal (exploration) and local optimization (exploitation) and adaptively chooses LLM operators during the optimization process. We have conducted a comprehensive evaluation across 35 benchmark tasks, and SEE significantly outperforms state-of-the-art baseline methods by a large margin, achieving an average performance gain of 13.94 while reducing computational costs by 58.67.

Wendi Cui, Zhuohang Li, Hao Sun, Damien Lopez, Kamalika Das, Bradley Malin, Sricharan Kumar, Jiaxin Zhang• 2024

Related benchmarks

Task	Dataset	Result
Medical Visual Question Answering	Slake	Accuracy35	247
Fine-grained Image Classification	CUB	Top-1 Acc71.6	45
Video Classification	Drive&Act	Accuracy51.7	36
Image Classification	PlantVillage	Accuracy69	12
Molecular property prediction	Absorption	Accuracy71.4	12
Molecular property prediction	CYP Inhibit	Accuracy61.4	12
Remote Sensing Visual Question Answering	RSVQA	Accuracy53.4	12
Video Question Answering	VANE	Accuracy57.9	12
Visual Question Answering	DrivingVQA	Accuracy52.2	12
Function Calling	BFCL rand (test)	Accuracy52.2	4

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord