Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Error Taxonomy-Guided Prompt Optimization

About

Automatic Prompt Optimization (APO) is a powerful approach for extracting performance from large language models without modifying their weights. Many existing methods rely on trial-and-error, testing different prompts or in-context examples until a good configuration emerges, often consuming substantial compute. Recently, natural language feedback derived from execution logs has shown promise as a way to identify how prompts can be improved. However, most prior approaches operate in a bottom-up manner, iteratively adjusting the prompt based on feedback from individual problems, which can cause them to lose the global perspective. In this work, we propose Error Taxonomy-Guided Prompt Optimization (ETGPO), a prompt optimization algorithm that adopts a top-down approach. ETGPO focuses on the global failure landscape by collecting model errors, categorizing them into a taxonomy, and augmenting the prompt with guidance targeting the most frequent failure modes. Across multiple benchmarks spanning mathematics, question answering, and logical reasoning, ETGPO achieves accuracy that is comparable to or better than state-of-the-art methods, while requiring roughly one third of the optimization-phase token usage and evaluation budget.

Mayank Singh, Vikas Yadav, Eduardo Blanco• 2026

Related benchmarks

TaskDatasetResultRank
Logical reasoningFOLIO (test)
Accuracy82.45
58
Logical reasoningAR-LSAT (test)
Accuracy91.44
24
Multi-hop ReasoningMuSiQue (test)
Mean Accuracy77.3
4
General Question AnsweringMMLU Pro (test)
Mean Accuracy79.4
4
Math ReasoningAIME 2025 (test)
Mean Accuracy49.06
4
GeneralMMLU Pro (test)
Accuracy83.65
4
GeneralMMLU Pro (test)
Optimization Token Usage (k)778
3
General Question AnsweringMMLU Pro (test)
Optimization Token Usage595
3
Logical reasoningFOLIO
Optimization-phase Token Usage453
3
Math ReasoningAIME (test)
Token Usage (Optimization Phase, Thousands)3.10e+3
3
Showing 10 of 15 rows

Other info

Follow for update