Generalizable Self-Evolving Memory for Automatic Prompt Optimization

About

Automatic prompt optimization is a promising approach for adapting large language models (LLMs) to downstream tasks, yet existing methods typically search for a specific prompt specialized to a fixed task. This paradigm limits generalization across heterogeneous queries and prevents models from accumulating reusable prompting knowledge over time. In this paper, we propose MemAPO, a memory-driven framework that reconceptualizes prompt optimization as generalizable and self-evolving experience accumulation. MemAPO maintains a dual-memory mechanism that distills successful reasoning trajectories into reusable strategy templates while organizing incorrect generations into structured error patterns that capture recurrent failure modes. Given a new prompt, the framework retrieves both relevant strategies and failure patterns to compose prompts that promote effective reasoning while discouraging known mistakes. Through iterative self-reflection and memory editing, MemAPO continuously updates its memory, enabling prompt optimization to improve over time rather than restarting from scratch for each task. Experiments on diverse benchmarks show that MemAPO consistently outperforms representative prompt optimization baselines while substantially reducing optimization cost.

Guanbao Liang, Yuanchen Bei, Sheng Zhou, Yuheng Qin, Huan Zhou, Bingxin Jia, Bin Li, Jiajun Bu• 2026

Related benchmarks

Task	Dataset	Result
Mathematical Problem Solving	Gaokao MathQA	Accuracy76.9	60
Knowledge Intensive	Gaokao History	Accuracy83	30
Knowledge Intensive	Gaokao Geography	Accuracy79.8	20
Logical reasoning	GeoShape BBH	Accuracy90	20
Logical reasoning	GeoShape BBEH	Accuracy43	20
Mathematical Calculation	AQUA-RAT	Accuracy (AQuA-RAT)83.1	20
Prompt Optimization	Logical Reasoning, Mathematical Calculation, and Knowledge Intensive tasks Average	Average Performance (%)70.7	20

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord