Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks

About

Long-context Large Language Models, despite their expanded capacity, require careful working memory management to mitigate attention dilution during long-horizon tasks. Yet existing approaches rely on external mechanisms that lack awareness of the agent's reasoning state, leading to suboptimal decisions. We propose Memory-as-Action (MemAct), a framework that treats working memory management as learnable policy actions. By formulating context management as in-place editing operations (deletion, insertion), MemAct enables joint optimization of information retention and task performance through end-to-end reinforcement learning. To address the computational challenges of dynamic context updates, we introduce Dynamic Context Policy Optimization, which restores training efficiency without compromising reasoning integrity. Experiments show that MemAct-RL-14B matches the accuracy of models $16\times$ larger while reducing average context length by 51\%, with learned strategies that adapt to model capabilities and generalize across task complexities.

Yuxiang Zhang, Jiangming Shu, Ye Ma, Xueyuan Lin, Shangxi Wu, Jitao Sang• 2025

Related benchmarks

Task	Dataset	Result
Web Search	BrowseComp+	Pass@350.67	60
Long-context Question Answering	Locomo	Single-Hop LLJ Score47.8	45
Multi-objective task	Multi-Objective Tasks	Accuracy (2-obj)67.2	10
Single-Objective Task	Single-Objective Tasks (2WikiMultihopQA, HotpotQA, Bamboogle, Frames, BrowseComp-Plus)	Accuracy (2WikiMultihopQA)76.7	9

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord