Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems

About

Large language model (LLM)-powered multi-agent systems (MAS) have demonstrated cognitive and execution capabilities that far exceed those of single LLM agents, yet their capacity for self-evolution remains hampered by underdeveloped memory architectures. Upon close inspection, we are alarmed to discover that prevailing MAS memory mechanisms (1) are overly simplistic, completely disregarding the nuanced inter-agent collaboration trajectories, and (2) lack cross-trial and agent-specific customization, in stark contrast to the expressive memory developed for single agents. To bridge this gap, we introduce G-Memory, a hierarchical, agentic memory system for MAS inspired by organizational memory theory, which manages the lengthy MAS interaction via a three-tier graph hierarchy: insight, query, and interaction graphs. Upon receiving a new user query, G-Memory performs bi-directional memory traversal to retrieve both $\textit{high-level, generalizable insights}$ that enable the system to leverage cross-trial knowledge, and $\textit{fine-grained, condensed interaction trajectories}$ that compactly encode prior collaboration experiences. Upon task execution, the entire hierarchy evolves by assimilating new collaborative trajectories, nurturing the progressive evolution of agent teams. Extensive experiments across five benchmarks, three LLM backbones, and three popular MAS frameworks demonstrate that G-Memory improves success rates in embodied action and accuracy in knowledge QA by up to $20.89\%$ and $10.12\%$, respectively, without any modifications to the original frameworks. Our codes are available at https://github.com/bingreeky/GMemory.

Guibin Zhang, Muxin Fu, Guancheng Wan, Miao Yu, Kun Wang, Shuicheng Yan• 2025

Related benchmarks

TaskDatasetResultRank
Automated PlanningPDDL
Accuracy24.31
233
Question AnsweringPopQA
Accuracy48.96
186
Question AnsweringStrategyQA
Accuracy64.2
114
Question AnsweringTriviaQA
Accuracy74.6
85
Code GenerationBigCodeBench
Accuracy82.67
59
Code GenerationKodCode
Accuracy50.2
38
Agentic Task SuccessTextworld
Success Rate68.8
12
Agentic Task SuccessAlfWorld
Success Rate74.8
12
Agentic Task SuccessBaba Is AI
Success Rate26.2
12
Agentic Task SuccessMiniHack
Success Rate14.2
12
Showing 10 of 19 rows

Other info

Follow for update