Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MemGen: Weaving Generative Latent Memory for Self-Evolving Agents

About

Agent memory shapes how Large Language Model (LLM)-powered agents, akin to the human brain, progressively refine themselves through environment interactions. Existing paradigms remain constrained: parametric memory forcibly adjusts model parameters, and retrieval-based memory externalizes experience into structured databases, yet neither captures the fluid interweaving of reasoning and memory that underlies human cognition. To address this gap, we propose MemGen, a dynamic generative memory framework that equips agents with a human-esque cognitive faculty. It consists of a \textit{memory trigger}, which monitors the agent's reasoning state to decide explicit memory invocation, and a \textit{memory weaver}, which takes the agent's current state as stimulus to construct a latent token sequence as machine-native memory to enrich its reasoning. In this way, MemGen enables agents to recall and augment latent memory throughout reasoning, producing a tightly interwoven cycle of memory and cognition. Extensive experiments across eight benchmarks show that MemGen surpasses leading external memory systems such as ExpeL and AWM by up to $38.22\%$, exceeds GRPO by up to $13.44\%$, and exhibits strong cross-domain generalization ability. More importantly, we find that without explicit supervision, MemGen spontaneously evolves distinct human-like memory faculties, including planning memory, procedural memory, and working memory, suggesting an emergent trajectory toward more naturalistic forms of machine cognition.

Guibin Zhang, Muxin Fu, Shuicheng Yan• 2025

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningGSM8K
Accuracy71.12
983
Mathematical ReasoningGSM8K (test)
Accuracy85.42
797
Mathematical ReasoningMATH
Accuracy50.95
535
Mathematical ReasoningMATH (test)
Overall Accuracy60.23
433
Question AnsweringNarrativeQA (test)
ROUGE-L63.94
61
Science ReasoningGPQA (test)
Accuracy21.68
41
Code GenerationKodCode
Accuracy57.7
38
Long document summarizationBookSum (test)
ROUGE 112.86
37
Question AnsweringWikihop (test)
Accuracy41.35
32
Question AnsweringMerged QA HotpotQA, NarrativeQA, WikiHop (test)
Accuracy54.56
24
Showing 10 of 16 rows

Other info

Follow for update