Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory
About
Large language model (LLM) agents require long-term user memory for consistent personalization, but limited context windows hinder tracking evolving preferences over long interactions. Existing memory systems mainly rely on static, hand-crafted update rules; although reinforcement learning (RL)-based agents learn memory updates, sparse outcome rewards provide weak supervision, resulting in unstable long-horizon optimization. Drawing on memory schema theory and the functional division between prefrontal regions and hippocampus regions, we introduce MemCoE, a cognition-inspired two-stage optimization framework that learns how memory should be organized and what information to update. In the first stage, we propose Memory Guideline Induction to optimize a global guideline via contrastive feedback interpreted as textual gradients; in the second stage, Guideline-Aligned Memory Policy Optimization uses the induced guideline to define structured process rewards and performs multi-turn RL to learn a guideline-following memory evolution policy. We evaluate on three personalization memory benchmarks, covering explicit/implicit preference and different sizes and noise, and observe consistent improvements over strong baselines with favorable robustness, transferability, and efficiency.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Personalized retrieval and QA over heterogeneous user corpora | PersonaBench w/o Noise | F1 Score32.27 | 8 | |
| Personalized retrieval and QA over heterogeneous user corpora | PersonaBench Noise Level 0.3 | F1 Score29.89 | 8 | |
| Personalized retrieval and QA over heterogeneous user corpora | PersonaBench Noise Level 0.5 | F1 Score25.99 | 8 | |
| Personalized retrieval and QA over heterogeneous user corpora | PersonaBench Noise Level 0.7 | F1 Score25.09 | 8 | |
| Preference evaluation via multi-choice queries | PrefEval Explicit | Accuracy81.3 | 8 | |
| Preference evaluation via multi-choice queries | PrefEval Implicit | Accuracy69.9 | 8 | |
| Preference evolution over long multi-session histories | PersonaMem 32K context scale | Accuracy57.06 | 8 | |
| Preference evolution over long multi-session histories | PersonaMem 128K context scale | Accuracy47.24 | 8 |