Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SEE: Strategic Exploration and Exploitation for Cohesive In-Context Prompt Optimization

About

Designing optimal prompts for Large Language Models (LLMs) is a complicated and resource-intensive task, often requiring substantial human expertise and effort. Existing approaches typically separate the optimization of prompt instructions and in-context learning examples, leading to incohesive prompts that are defined and represented by suboptimal task performance. To overcome these challenges, we propose a novel Cohesive In-Context Prompt Optimization framework that refines both prompt instructions and examples. However, formulating such an optimization in the discrete and high-dimensional space of natural language poses significant challenges in both convergence and computational efficiency. To address these issues, we introduce SEE, a scalable and efficient prompt optimization framework that adopts metaheuristic optimization principles and strategically balances exploration and exploitation to enhance optimization performance and achieve efficient convergence. SEE features a quad-phased design that alternates between global traversal (exploration) and local optimization (exploitation) and adaptively chooses LLM operators during the optimization process. We have conducted a comprehensive evaluation across 35 benchmark tasks, and SEE significantly outperforms state-of-the-art baseline methods by a large margin, achieving an average performance gain of 13.94 while reducing computational costs by 58.67.

Wendi Cui, Zhuohang Li, Hao Sun, Damien Lopez, Kamalika Das, Bradley Malin, Sricharan Kumar, Jiaxin Zhang• 2024

Related benchmarks

TaskDatasetResultRank
Medical Visual Question AnsweringSlake
Accuracy35
134
Video ClassificationDrive&Act
Accuracy51.7
36
Fine-grained Image ClassificationCUB
Top-1 Acc71.6
22
Image ClassificationPlantVillage
Accuracy69
12
Molecular property predictionAbsorption
Accuracy71.4
12
Molecular property predictionCYP Inhibit
Accuracy61.4
12
Remote Sensing Visual Question AnsweringRSVQA
Accuracy53.4
12
Video Question AnsweringVANE
Accuracy57.9
12
Visual Question AnsweringDrivingVQA
Accuracy52.2
12
Function CallingBFCL rand (test)
Accuracy52.2
4
Showing 10 of 13 rows

Other info

Follow for update