Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

A-MEM: Agentic Memory for LLM Agents

About

While large language model (LLM) agents can effectively use external tools for complex real-world tasks, they require memory systems to leverage historical experiences. Current memory systems enable basic storage and retrieval but lack sophisticated memory organization, despite recent attempts to incorporate graph databases. Moreover, these systems' fixed operations and structures limit their adaptability across diverse tasks. To address this limitation, this paper proposes a novel agentic memory system for LLM agents that can dynamically organize memories in an agentic way. Following the basic principles of the Zettelkasten method, we designed our memory system to create interconnected knowledge networks through dynamic indexing and linking. When a new memory is added, we generate a comprehensive note containing multiple structured attributes, including contextual descriptions, keywords, and tags. The system then analyzes historical memories to identify relevant connections, establishing links where meaningful similarities exist. Additionally, this process enables memory evolution - as new memories are integrated, they can trigger updates to the contextual representations and attributes of existing historical memories, allowing the memory network to continuously refine its understanding. Our approach combines the structured organization principles of Zettelkasten with the flexibility of agent-driven decision making, allowing for more adaptive and context-aware memory management. Empirical experiments on six foundation models show superior improvement against existing SOTA baselines. The source code for evaluating performance is available at https://github.com/WujiangXu/A-mem, while the source code of the agentic memory system is available at https://github.com/WujiangXu/A-mem-sys.

Wujiang Xu, Zujie Liang, Kai Mei, Hang Gao, Juntao Tan, Yongfeng Zhang• 2025

Related benchmarks

TaskDatasetResultRank
Multi-hop Question AnsweringHotpotQA
F1 Score34.83
221
Function CallingBFCL V3--
88
Long-term memory evaluationLocomo
Overall F139.65
70
Multi-hop Question AnsweringLocomo
F132.97
67
Long-context Question AnsweringLocomo
Average F141.97
64
Question AnsweringNarrativeQA (test)
ROUGE-L55.99
61
Long-context Memory RetrievalLocomo
Single-hop40.22
55
Single-hop Question AnsweringLocomo
F10.4843
53
Open-domain Question AnsweringLocomo
F10.1745
53
Long-context reasoning and retrievalLoCoMo (test)
Single-Hop F161.47
37
Showing 10 of 92 rows
...

Other info

Follow for update