Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Submodular Evaluation Subset Selection in Automatic Prompt Optimization

About

Automatic prompt optimization reduces manual prompt engineering, but relies on task performance measured on a small, often randomly sampled evaluation subset as its main source of feedback signal. Despite this, how to select that evaluation subset is usually treated as an implementation detail. We study evaluation subset selection for prompt optimization from a principled perspective and propose SESS, a submodular evaluation subset selection method. We frame selection as maximizing an objective set function and show that, under mild conditions, it is monotone and submodular, enabling greedy selection with theoretical guarantees. Across GSM8K, MATH, and GPQA-Diamond, submodularly selected evaluation subsets can yield better optimized prompts than random or heuristic baselines.

Jinming Nian, Zhiyuan Peng, Hongwei Shang, Dae Hoon Park, Yi Fang• 2026

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningGSM8K (test)
Accuracy91.9
900
ReasoningBBH
Accuracy81.7
672
MathGSM8K
Accuracy0.952
206
MathematicsMATH
MATH Accuracy90.7
85
Math ReasoningGSM-Hard
Accuracy79.9
67
Math ReasoningMultiArith
Accuracy96.7
65
Knowledge ReasoningMMLU
MMLU Knowledge Reasoning Accuracy69.8
65
General ReasoningBIG-bench
Accuracy (General)71.9
36
Graduate-level Question AnsweringGPQA Diamond (test)
Accuracy37.4
16
Mathematical ReasoningMATH (test)
Exact Match (EM)76.1
16
Showing 10 of 11 rows

Other info

Follow for update