Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks

About

Long-context Large Language Models, despite their expanded capacity, require careful working memory management to mitigate attention dilution during long-horizon tasks. Yet existing approaches rely on external mechanisms that lack awareness of the agent's reasoning state, leading to suboptimal decisions. We propose Memory-as-Action (MemAct), a framework that treats working memory management as learnable policy actions. By formulating context management as in-place editing operations (deletion, insertion), MemAct enables joint optimization of information retention and task performance through end-to-end reinforcement learning. To address the computational challenges of dynamic context updates, we introduce Dynamic Context Policy Optimization, which restores training efficiency without compromising reasoning integrity. Experiments show that MemAct-RL-14B matches the accuracy of models $16\times$ larger while reducing average context length by 51\%, with learned strategies that adapt to model capabilities and generalize across task complexities.

Yuxiang Zhang, Jiangming Shu, Ye Ma, Xueyuan Lin, Shangxi Wu, Jitao Sang• 2025

Related benchmarks

TaskDatasetResultRank
Web SearchBrowseComp+
Pass@350.67
60
Long-context Question AnsweringLocomo
Single-Hop LLJ Score47.8
45
Multi-objective taskMulti-Objective Tasks
Accuracy (2-obj)67.2
10
Single-Objective TaskSingle-Objective Tasks (2WikiMultihopQA, HotpotQA, Bamboogle, Frames, BrowseComp-Plus)
Accuracy (2WikiMultihopQA)76.7
9
Showing 4 of 4 rows

Other info

Follow for update