Error Taxonomy-Guided Prompt Optimization

About

Automatic Prompt Optimization (APO) is a powerful approach for extracting performance from large language models without modifying their weights. Many existing methods rely on trial-and-error, testing different prompts or in-context examples until a good configuration emerges, often consuming substantial compute. Recently, natural language feedback derived from execution logs has shown promise as a way to identify how prompts can be improved. However, most prior approaches operate in a bottom-up manner, iteratively adjusting the prompt based on feedback from individual problems, which can cause them to lose the global perspective. In this work, we propose Error Taxonomy-Guided Prompt Optimization (ETGPO), a prompt optimization algorithm that adopts a top-down approach. ETGPO focuses on the global failure landscape by collecting model errors, categorizing them into a taxonomy, and augmenting the prompt with guidance targeting the most frequent failure modes. Across multiple benchmarks spanning mathematics, question answering, and logical reasoning, ETGPO achieves accuracy that is comparable to or better than state-of-the-art methods, while requiring roughly one third of the optimization-phase token usage and evaluation budget.

Mayank Singh, Vikas Yadav, Eduardo Blanco• 2026

Related benchmarks

Task	Dataset	Result
Logical reasoning	FOLIO (test)	Accuracy82.45	58
Logical reasoning	AR-LSAT (test)	Accuracy91.44	24
Multi-hop Reasoning	MuSiQue (test)	Mean Accuracy77.3	4
General Question Answering	MMLU Pro (test)	Mean Accuracy79.4	4
Math Reasoning	AIME 2025 (test)	Mean Accuracy49.06	4
General	MMLU Pro (test)	Accuracy83.65	4
General	MMLU Pro (test)	Optimization Token Usage (k)778	3
General Question Answering	MMLU Pro (test)	Optimization Token Usage595	3
Logical reasoning	FOLIO	Optimization-phase Token Usage453	3
Math Reasoning	AIME (test)	Token Usage (Optimization Phase, Thousands)3.10e+3	3

Showing 10 of 15 rows

Other info

Follow for update

@wizwand_team Discord