G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems

About

Large language model (LLM)-powered multi-agent systems (MAS) have demonstrated cognitive and execution capabilities that far exceed those of single LLM agents, yet their capacity for self-evolution remains hampered by underdeveloped memory architectures. Upon close inspection, we are alarmed to discover that prevailing MAS memory mechanisms (1) are overly simplistic, completely disregarding the nuanced inter-agent collaboration trajectories, and (2) lack cross-trial and agent-specific customization, in stark contrast to the expressive memory developed for single agents. To bridge this gap, we introduce G-Memory, a hierarchical, agentic memory system for MAS inspired by organizational memory theory, which manages the lengthy MAS interaction via a three-tier graph hierarchy: insight, query, and interaction graphs. Upon receiving a new user query, G-Memory performs bi-directional memory traversal to retrieve both $\textit{high-level, generalizable insights}$ that enable the system to leverage cross-trial knowledge, and $\textit{fine-grained, condensed interaction trajectories}$ that compactly encode prior collaboration experiences. Upon task execution, the entire hierarchy evolves by assimilating new collaborative trajectories, nurturing the progressive evolution of agent teams. Extensive experiments across five benchmarks, three LLM backbones, and three popular MAS frameworks demonstrate that G-Memory improves success rates in embodied action and accuracy in knowledge QA by up to $20.89\%$ and $10.12\%$, respectively, without any modifications to the original frameworks. Our codes are available at https://github.com/bingreeky/GMemory.

Guibin Zhang, Muxin Fu, Guancheng Wan, Miao Yu, Kun Wang, Shuicheng Yan• 2025

Related benchmarks

Task	Dataset	Result
Interactive Decision-making	AlfWorld	Overall Success Rate96.69	295
Code Generation	MBPP+	Accuracy85.75	236
Automated Planning	PDDL	Accuracy24.31	233
General Reasoning	BBH	Accuracy63.72	190
Question Answering	PopQA	Accuracy48.96	186
Mathematical Reasoning	AIME 24/25	Accuracy38.33	171
Question Answering	StrategyQA	Accuracy64.2	123
Question Answering	TriviaQA	Accuracy74.6	117
Embodied Task Completion	AlfWorld	Success Rate89.17	96
Complex Reasoning	BBH	Accuracy89.2	85

Showing 10 of 38 rows

Other info

Follow for update

@wizwand_team Discord