UPA: Unsupervised Prompt Agent via Tree-Based Search and Selection

About

Prompt agents have recently emerged as a promising paradigm for automated prompt optimization, framing prompt discovery as a sequential decision-making problem over a structured prompt space. While this formulation enables the use of advanced planning algorithms, these methods typically assume access to supervised reward signals, which are often unavailable in practical scenarios. In this work, we propose UPA, an Unsupervised Prompt Agent that realizes structured search and selection without relying on ground-truth (GT) rewards. Specifically, during search, UPA iteratively constructs an evolving tree structure to navigate the prompt space, guided by fine-grained and position-debiased pairwise comparisons from Large Language Models (LLMs). Crucially, as these local comparisons do not inherently yield a consistent global scale, we decouple systematic prompt exploration from final selection, introducing a two-stage framework grounded in the Bradley-Terry-Luce (BTL) model. This framework first performs path-wise Bayesian aggregation of local comparisons to filter candidates under uncertainty, followed by global tournament-style comparisons to infer latent prompt quality and identify the optimal prompt. Experiments across multiple tasks demonstrate that UPA consistently outperforms existing prompt optimization methods, showing that agent-style optimization can remain highly effective even in unsupervised settings.

Siran Peng, Weisong Zhao, Tianyu Fu, Chenxu Zhao, Tianshuo Zhang, Haoyuan Zhang, Xiangyu Zhu, Minghui Wu, Zhen Lei• 2026

Related benchmarks

Task	Dataset	Result
Question Answering	GPQA	Accuracy84.2	258
Logical reasoning	BBH	Accuracy100	249
Coreference Resolution	WSC	Accuracy98.5	116
Mathematical Reasoning	AGIEval MATH	Accuracy95.7	99
Question Answering	GPQA (test)	Accuracy45.5	65
Mathematical Reasoning	AGIEval-MATH (test)	Accuracy52.1	31
Fact Checking	LIAR	Accuracy78.8	28
Coreference Resolution	WSC (test)	Accuracy82.7	19
Fact Checking	LIAR (test)	Accuracy68.2	11
Navigation Reasoning	BBH-Navigate (test)	Accuracy98	11

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord