Communication Gain and Delay Cost Under Cross-Timestep Delays in Cooperative Multi-Agent Reinforcement Learning

About

Communication is essential for coordination in \emph{cooperative} multi-agent reinforcement learning under partial observability, yet \emph{cross-timestep} delays cause messages to arrive multiple timesteps after generation, inducing temporal misalignment and making information stale when consumed. We formalize this setting as a delayed-communication partially observable Markov game (DeComm-POMG) and decompose a message's effect into \emph{communication gain} and \emph{delay cost}, yielding the Communication Gain and Delay Cost (CGDC) metric. We further establish a value-loss bound showing that the degradation induced by delayed messages is upper-bounded by a discounted accumulation of an information gap between the action distributions induced by timely versus delayed messages. Guided by CGDC, we propose \textbf{CDCMA}, an actor--critic framework that requests messages only when predicted CGDC is positive, predicts future observations to reduce misalignment at consumption, and fuses delayed messages via CGDC-guided attention. Experiments on no-teammate-vision variants of Cooperative Navigation and Predator Prey, and on SMAC maps across multiple delay levels show consistent improvements in performance, robustness, and generalization, with ablations validating each component.

Zihong Gao, Hongjian Liang, Lei Hao, Liangjun Ke• 2026

Related benchmarks

Task	Dataset	Result
Cooperative Navigation	Cooperative Navigation easy	Mean Episode Reward-1.7	14
Cooperative Navigation	multi-agent particle environment medium	Average Return-1.8	7
Cooperative Navigation	Cooperative Navigation hard	Mean Episode Reward-1.78	7
Cooperative Navigation	Cooperative Navigation super_hard	Mean Episode Reward-1.89	7
Multi-agent cooperation	SMAC 1o_2r_vs_4r (easy)	Win Rate81.25	7
Multi-agent cooperation	SMAC 1o_2r_vs_4r medium	Win Rate79.72	7
Multi-agent cooperation	SMAC 1o_2r_vs_4r hard	Win Rate68.81	7
Multi-agent cooperation	SMAC 1o_2r_vs_4r super_hard	Win Rate63.28	7
Multi-agent cooperation	SMAC 1o_10b_vs_1r (easy)	Win Rate81.88	7
Multi-agent cooperation	SMAC 1o_10b_vs_1r medium	Win Rate79.79	7

Showing 10 of 19 rows

Other info

Follow for update

@wizwand_team Discord