MemGen: Weaving Generative Latent Memory for Self-Evolving Agents

About

Agent memory shapes how Large Language Model (LLM)-powered agents, akin to the human brain, progressively refine themselves through environment interactions. Existing paradigms remain constrained: parametric memory forcibly adjusts model parameters, and retrieval-based memory externalizes experience into structured databases, yet neither captures the fluid interweaving of reasoning and memory that underlies human cognition. To address this gap, we propose MemGen, a dynamic generative memory framework that equips agents with a human-esque cognitive faculty. It consists of a \textit{memory trigger}, which monitors the agent's reasoning state to decide explicit memory invocation, and a \textit{memory weaver}, which takes the agent's current state as stimulus to construct a latent token sequence as machine-native memory to enrich its reasoning. In this way, MemGen enables agents to recall and augment latent memory throughout reasoning, producing a tightly interwoven cycle of memory and cognition. Extensive experiments across eight benchmarks show that MemGen surpasses leading external memory systems such as ExpeL and AWM by up to $38.22\%$, exceeds GRPO by up to $13.44\%$, and exhibits strong cross-domain generalization ability. More importantly, we find that without explicit supervision, MemGen spontaneously evolves distinct human-like memory faculties, including planning memory, procedural memory, and working memory, suggesting an emergent trajectory toward more naturalistic forms of machine cognition.

Guibin Zhang, Muxin Fu, Shuicheng Yan• 2025

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	GSM8K	Accuracy71.12	1398
Mathematical Reasoning	GSM8K (test)	Accuracy85.42	954
Instruction Following	IFEval	IFEval Accuracy39.37	836
Mathematical Reasoning	MATH	Accuracy50.95	535
Mathematical Reasoning	MATH (test)	Overall Accuracy60.23	433
Question Answering	NarrativeQA (test)	ROUGE-L63.94	88
Long-context Reasoning	Locomo	--	75
Embodied AI Task Planning	EB-ALFRED	Average Score14.33	72
Query Answering	PersonaMem 32K context length	Query-Answering Accuracy70	60
Query Answering	PersonaMem 128K context length	Query-Answering Accuracy0.66	60

Showing 10 of 37 rows

Other info

Follow for update

@wizwand_team Discord