Adaptive Prompt Structure Factorization: A Framework for Self-Discovering and Optimizing Compositional Prompt Programs

About

Automated prompt optimization is crucial for eliciting reliable reasoning from large language models (LLMs), yet most API-only prompt optimizers iteratively edit monolithic prompts, coupling components and obscuring credit assignment, limiting controllability, and wasting tokens. We propose Adaptive Prompt Structure Factorization (aPSF), an API-only framework (prompt-in/text-out; no access to model internals) that uses an Architect model to discover task-specific prompt structures as semantic factors. aPSF then performs interventional, single-factor updates: interventional factor-level scoring estimates each factor's marginal contribution via validation-performance changes, and error-guided factor selection routes updates to the current dominant failure source for more sample-efficient optimization. Across multiple advanced reasoning benchmarks, aPSF outperforms strong baselines including principle-aware optimizers, improving accuracy by up to +2.16 percentage points on average, and reduces optimization cost by 45--87% tokens on MultiArith while reaching peak validation in 1 step.

Haoyue Liu, Zhichao Wang, Yongxin Guo, Haoran Shou, Xiaoying Tang• 2026

Related benchmarks

Task	Dataset	Result
Multi-task Language Understanding	MMLU	--	881
Math Reasoning	GSM8K (test)	Accuracy90.87	250
Mathematical Reasoning	GSM-Hard	--	162
Mathematical Reasoning	GSM8K (val)	--	108
Math Reasoning	MultiArith (test)	Accuracy99.53	54
Mathematical Reasoning	AQuA-RAT (test)	Accuracy83	40
Math Reasoning	GSM-Hard (test)	Accuracy55.86	30
Mathematical Reasoning	AQUA (val)	Tokens at Best Step (K)336	7
Mathematical Reasoning	MultiArith (val)	Tokens at Best Step (K)206	7
Mathematical Reasoning	Competition Math (test)	Accuracy56	5

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord