Task Facet Learning: A Structured Approach to Prompt Optimization
About
Given a task in the form of a basic description and its training examples, prompt optimization is the problem of synthesizing the given information into a text prompt for a large language model. Humans solve this problem by also considering the different facets that define a task (e.g., counter-examples, explanations, analogies) and including them in the prompt. However, it is unclear whether existing algorithmic approaches, based on iteratively editing a given prompt or automatically selecting a few in-context examples, can cover the multiple facets required to solve a complex task. In this work, we view prompt optimization as that of learning multiple facets of a task from a set of training examples. We exploit structure in the prompt optimization problem and break down a prompt into loosely coupled semantic sections. The proposed algorithm, UniPrompt, (1) clusters the input space and uses clustered batches so that each batch likely corresponds to a different facet of the task, and (2) utilizes a feedback mechanism to propose adding, editing or deleting a section, which in turn is aggregated over a batch to capture generalizable facets. Empirical evaluation on multiple datasets and a real-world task shows that prompts generated using \shortname{} obtain higher accuracy than human-tuned prompts and those from state-of-the-art methods. In particular, our algorithm can generate long, complex prompts that existing methods are unable to generate. Code for UniPrompt is available at https://aka.ms/uniprompt.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mathematical Reasoning | GSM8K (test) | Accuracy82.4 | 797 | |
| Code Generation | HumanEval (test) | -- | 444 | |
| Code Generation | MBPP (test) | -- | 276 | |
| Question Answering | ARC (test) | Accuracy90.5 | 67 | |
| Question Answering | MedQA (test) | Accuracy57.1 | 61 | |
| Reasoning | BBH (test) | -- | 40 | |
| Hate Speech Detection | Ethos (test) | Accuracy0.937 | 14 |