Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

EquiMem: Calibrating Shared Memory in Multi-Agent Debate via Game-Theoretic Equilibrium

About

Multi-agent debate (MAD) systems increasingly rely on shared memory to support long-horizon reasoning, but this convenience opens a critical vulnerability: a single corrupted entry can contaminate the downstream memory-augmented reasoning, and debate alone fails to filter such errors. Existing safeguards filter entries via heuristics or LLM-based validation, yet they rely on AI judgments that share the same failure modes and overlook the cross-agent dynamics of MAD. We address this gap by formulating memory updating in MAD as a zero-trust memory game, in which no agent is assumed honest and the game's equilibrium serves as an indicator of optimal memory trust. Guided by this equilibrium, we propose EquiMem, an inference-time calibration mechanism that quantifies each update algorithmically against the shared memory state, using agents' existing retrieval queries and traversal paths as evidence rather than soliciting any LLM judgment. EquiMem instantiates calibration for both embedding- and graph-based memory, and across diverse benchmarks, MAD frameworks, and memory architectures, it consistently outperforms existing safeguards, remains robust under adversarial agents, and incurs negligible inference overhead.

Yuqiao Meng, Sakshi Sunil Narvekar, Luoxi Tang, Rupali Rajendra Vaje, Yingxue Zhang, Muchao Ye, Zhaohan Xi• 2026

Related benchmarks

TaskDatasetResultRank
Web Navigation and ShoppingWebshop
Score60.1
153
Embodied Task CompletionAlfWorld
Success Rate84.4
96
Long-context ManagementLocomo
F1 Score65.2
57
Latency overhead measurement for memory safeguardsMemBank + HotpotQA
Added Latency (s)0.9
5
Latency overhead measurement for memory safeguardsG-Memory + ALFWorld
Added Latency (s)3.9
4
Showing 5 of 5 rows

Other info

Follow for update