Adaptive Prompt Structure Factorization: A Framework for Self-Discovering and Optimizing Compositional Prompt Programs
About
Automated prompt optimization is crucial for eliciting reliable reasoning from large language models (LLMs), yet most API-only prompt optimizers iteratively edit monolithic prompts, coupling components and obscuring credit assignment, limiting controllability, and wasting tokens. We propose Adaptive Prompt Structure Factorization (aPSF), an API-only framework (prompt-in/text-out; no access to model internals) that uses an Architect model to discover task-specific prompt structures as semantic factors. aPSF then performs interventional, single-factor updates: interventional factor-level scoring estimates each factor's marginal contribution via validation-performance changes, and error-guided factor selection routes updates to the current dominant failure source for more sample-efficient optimization. Across multiple advanced reasoning benchmarks, aPSF outperforms strong baselines including principle-aware optimizers, improving accuracy by up to +2.16 percentage points on average, and reduces optimization cost by 45--87% tokens on MultiArith while reaching peak validation in 1 step.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multi-task Language Understanding | MMLU | -- | 876 | |
| Math Reasoning | GSM8K (test) | Accuracy90.87 | 192 | |
| Mathematical Reasoning | GSM-Hard | -- | 162 | |
| Mathematical Reasoning | GSM8K (val) | -- | 81 | |
| Mathematical Reasoning | AQuA-RAT (test) | Accuracy83 | 40 | |
| Math Reasoning | MultiArith (test) | Accuracy99.53 | 30 | |
| Math Reasoning | GSM-Hard (test) | Accuracy55.86 | 30 | |
| Mathematical Reasoning | AQUA (val) | Tokens at Best Step (K)336 | 7 | |
| Mathematical Reasoning | MultiArith (val) | Tokens at Best Step (K)206 | 7 | |
| Mathematical Reasoning | Competition Math (test) | Accuracy56 | 5 |