Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Self-Supervised Prompt Optimization

About

Well-designed prompts are crucial for enhancing Large language models' (LLMs) reasoning capabilities while aligning their outputs with task requirements across diverse domains. However, manually designed prompts require expertise and iterative experimentation. While existing prompt optimization methods aim to automate this process, they rely heavily on external references such as ground truth or by humans, limiting their applicability in real-world scenarios where such data is unavailable or costly to obtain. To address this, we propose Self-Supervised Prompt Optimization (SPO), a cost-efficient framework that discovers effective prompts for both closed and open-ended tasks without requiring external reference. Motivated by the observations that prompt quality manifests directly in LLM outputs and LLMs can effectively assess adherence to task requirements, we derive evaluation and optimization signals purely from output comparisons. Specifically, SPO selects superior prompts through pairwise output comparisons evaluated by an LLM evaluator, followed by an LLM optimizer that aligns outputs with task requirements. Extensive experiments demonstrate that SPO outperforms state-of-the-art prompt optimization methods, achieving comparable or superior results with significantly lower costs (e.g., 1.1% to 5.6% of existing methods) and fewer samples (e.g., three samples). The code is available at https://github.com/FoundationAgents/SPO.

Jinyu Xiang, Jiayi Zhang, Zhaoyang Yu, Xinbing Liang, Fengwei Teng, Jinhao Tu, Fashen Ren, Xiangru Tang, Sirui Hong, Chenglin Wu, Yuyu Luo• 2025

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningGSM8K
Accuracy93
983
Question AnsweringGPQA
Accuracy82.5
258
Coreference ResolutionWSC
Accuracy98.2
96
Logical reasoningBBH
Accuracy99.8
93
Question AnsweringGPQA (test)
Accuracy41.8
55
Symbolic and Logical ReasoningBig-Bench Hard (BBH)
Exact Match Performance84.13
22
Fact VerificationLIAR
F1 Score66.9
18
Mathematical ReasoningAGIEval MATH
Accuracy94.9
12
Fact CheckingLIAR
Accuracy78
12
Coreference ResolutionWSC (test)
Accuracy81.1
11
Showing 10 of 13 rows

Other info

Follow for update