GPS: General Per-Sample Prompter
About
LLMs are sensitive to prompting, with task performance often hinging on subtle, sometimes imperceptible variations in phrasing. As a result, crafting effective prompts manually remains challenging and time-consuming. Recent automatic prompting methods mitigate this difficulty but face three key limitations: (i) for each new task, they require large datasets to train good prompts;(ii) they rely on costly optimization loops that may take hours; (iii)they typically produce a single task-level prompt that does not adapt to the individual input problem to be solved. We propose GPS, the first general-purpose, per-sample prompting method. Without any task-specific tuning, GPS generates a tailored prompt for each unseen input, improving performance across diverse tasks. The prompter is trained with reinforcement learning on a suite of training tasks and includes a novel regularization for effectively adapting to per-sample prompting. Finally, we employ Minimum Bayes Risk decoding to stabilize inference. Empirically, GPS demonstrates competitive performance: we attain second best results among baselines on text simplification, third best results on summarization and on-par results on classification, while not training on any of these tasks, in contrast to the baselines. For in-domain prompting, we obtain sota on GSM8K. Our work shows the potential of a novel and effective paradigm for automatic prompting: generating adaptive, input-specific prompts without extensive optimization and without access to a task-specific training set. Our code is available at https://github.com/Batorskq/GPS.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mathematical Reasoning | MATH 500 | Accuracy34.2 | 442 | |
| Text Classification | TREC | Accuracy72.8 | 207 | |
| Medical Question Answering | MedQA | Accuracy54.92 | 153 | |
| Text Classification | MR | Accuracy89.15 | 106 | |
| Text Classification | SST-5 | Accuracy55.16 | 52 | |
| Text Classification | Subj | CA (%)65.1 | 48 | |
| Text Classification | CR | CA90.65 | 44 | |
| Mathematical Reasoning | DeepMath | Accuracy21.58 | 30 | |
| Text Classification | AG's News | Accuracy84.21 | 19 |