GPS: General Per-Sample Prompter

About

LLMs are sensitive to prompting, with task performance often hinging on subtle, sometimes imperceptible variations in phrasing. As a result, crafting effective prompts manually remains challenging and time-consuming. Recent automatic prompting methods mitigate this difficulty but face three key limitations: (i) for each new task, they require large datasets to train good prompts;(ii) they rely on costly optimization loops that may take hours; (iii)they typically produce a single task-level prompt that does not adapt to the individual input problem to be solved. We propose GPS, the first general-purpose, per-sample prompting method. Without any task-specific tuning, GPS generates a tailored prompt for each unseen input, improving performance across diverse tasks. The prompter is trained with reinforcement learning on a suite of training tasks and includes a novel regularization for effectively adapting to per-sample prompting. Finally, we employ Minimum Bayes Risk decoding to stabilize inference. Empirically, GPS demonstrates competitive performance: we attain second best results among baselines on text simplification, third best results on summarization and on-par results on classification, while not training on any of these tasks, in contrast to the baselines. For in-domain prompting, we obtain sota on GSM8K. Our work shows the potential of a novel and effective paradigm for automatic prompting: generating adaptive, input-specific prompts without extensive optimization and without access to a task-specific training set. Our code is available at https://github.com/Batorskq/GPS.

Pawel Batorski, Paul Swoboda• 2025

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	MATH 500	Accuracy34.2	442
Text Classification	TREC	Accuracy72.8	281
Text Classification	MR	Accuracy89.15	174
Medical Question Answering	MedQA	Accuracy54.92	153
Text Classification	SST-5	Accuracy55.16	119
Text Classification	Subj	CA (%)65.1	94
Text Classification	CR	CA90.65	55
Mathematical Reasoning	DeepMath	Accuracy21.58	37
Text Classification	AG's News	Accuracy84.21	30

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord