SEE: Strategic Exploration and Exploitation for Cohesive In-Context Prompt Optimization
About
Designing optimal prompts for Large Language Models (LLMs) is a complicated and resource-intensive task, often requiring substantial human expertise and effort. Existing approaches typically separate the optimization of prompt instructions and in-context learning examples, leading to incohesive prompts that are defined and represented by suboptimal task performance. To overcome these challenges, we propose a novel Cohesive In-Context Prompt Optimization framework that refines both prompt instructions and examples. However, formulating such an optimization in the discrete and high-dimensional space of natural language poses significant challenges in both convergence and computational efficiency. To address these issues, we introduce SEE, a scalable and efficient prompt optimization framework that adopts metaheuristic optimization principles and strategically balances exploration and exploitation to enhance optimization performance and achieve efficient convergence. SEE features a quad-phased design that alternates between global traversal (exploration) and local optimization (exploitation) and adaptively chooses LLM operators during the optimization process. We have conducted a comprehensive evaluation across 35 benchmark tasks, and SEE significantly outperforms state-of-the-art baseline methods by a large margin, achieving an average performance gain of 13.94 while reducing computational costs by 58.67.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Medical Visual Question Answering | Slake | Accuracy35 | 134 | |
| Video Classification | Drive&Act | Accuracy51.7 | 36 | |
| Fine-grained Image Classification | CUB | Top-1 Acc71.6 | 22 | |
| Image Classification | PlantVillage | Accuracy69 | 12 | |
| Molecular property prediction | Absorption | Accuracy71.4 | 12 | |
| Molecular property prediction | CYP Inhibit | Accuracy61.4 | 12 | |
| Remote Sensing Visual Question Answering | RSVQA | Accuracy53.4 | 12 | |
| Video Question Answering | VANE | Accuracy57.9 | 12 | |
| Visual Question Answering | DrivingVQA | Accuracy52.2 | 12 | |
| Function Calling | BFCL rand (test) | Accuracy52.2 | 4 |