Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks
About
Long-context Large Language Models, despite their expanded capacity, require careful working memory management to mitigate attention dilution during long-horizon tasks. Yet existing approaches rely on external mechanisms that lack awareness of the agent's reasoning state, leading to suboptimal decisions. We propose Memory-as-Action (MemAct), a framework that treats working memory management as learnable policy actions. By formulating context management as in-place editing operations (deletion, insertion), MemAct enables joint optimization of information retention and task performance through end-to-end reinforcement learning. To address the computational challenges of dynamic context updates, we introduce Dynamic Context Policy Optimization, which restores training efficiency without compromising reasoning integrity. Experiments show that MemAct-RL-14B matches the accuracy of models $16\times$ larger while reducing average context length by 51\%, with learned strategies that adapt to model capabilities and generalize across task complexities.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Web Search | BrowseComp+ | Pass@350.67 | 60 | |
| Long-context Question Answering | Locomo | Single-Hop LLJ Score47.8 | 45 | |
| Multi-objective task | Multi-Objective Tasks | Accuracy (2-obj)67.2 | 10 | |
| Single-Objective Task | Single-Objective Tasks (2WikiMultihopQA, HotpotQA, Bamboogle, Frames, BrowseComp-Plus) | Accuracy (2WikiMultihopQA)76.7 | 9 |