Mem-{\alpha}: Learning Memory Construction via Reinforcement Learning

About

Large language model (LLM) agents are constrained by limited context windows, necessitating external memory systems for long-term information understanding. Current memory-augmented agents typically depend on pre-defined instructions and tools for memory updates. However, language models may lack the ability to determine which information to store, how to structure it, and when to update it, especially as memory systems become more complex. This results in suboptimal memory construction and information loss. To this end, we propose Mem-alpha, a reinforcement learning framework that trains agents to effectively manage complex memory systems through interaction and feedback. We also construct a specialized training dataset spanning diverse multi-turn interaction patterns paired with comprehensive evaluation questions designed to teach effective memory management. During training, agents process sequential information chunks, learn to extract and store relevant content, then update the memory system. The reward signal derives from downstream question-answering accuracy over the full interaction history, directly optimizing for memory construction. To illustrate the effectiveness of our training framework, we design a memory architecture comprising core, episodic, and semantic components, equipped with multiple tools for memory operations. Empirical evaluation demonstrates that Mem-alpha achieves significant improvements over existing memory-augmented agent baselines. Despite being trained exclusively on instances with a maximum length of 30k tokens, our agents exhibit remarkable generalization to sequences exceeding 400k tokens, over 13x the training length, highlighting the robustness of Mem-alpha.

Yu Wang, Ryuichi Takanobu, Zhiqi Liang, Yuzhen Mao, Yuanzhe Hu, Julian McAuley, Xiaojian Wu• 2025

Related benchmarks

Task	Dataset	Result
Query Answering	PersonaMem 32K context length	Query-Answering Accuracy62	60
Query Answering	PersonaMem 128K context length	Query-Answering Accuracy0.67	60
Query Answering	PersonaMem 1M context length	Query-Answering Accuracy63	38
Question Answering	2Wiki 100K context	Accuracy50	25
Multiple-choice Query Answering	PersonaMem (Average)	Accuracy64	22
Question Answering	HotpotQA 10K context	Accuracy75	19
Question Answering	NQ 10K context	Accuracy47.9	19
Question Answering	2Wiki 10K context	Accuracy47.7	19
Question Answering	2Wiki 30K context	Accuracy40.1	19
Question Answering	Average 10K context	Accuracy47.3	19

Showing 10 of 35 rows

Other info

Follow for update

@wizwand_team Discord