Decomposing Communication Gain and Delay Cost Under Cross-Timestep Delays in Cooperative Multi-Agent Reinforcement Learning
About
Communication is essential for coordination in \emph{cooperative} multi-agent reinforcement learning under partial observability, yet \emph{cross-timestep} delays cause messages to arrive multiple timesteps after generation, inducing temporal misalignment and making information stale when consumed. We formalize this setting as a delayed-communication partially observable Markov game (DeComm-POMG) and decompose a message's effect into \emph{communication gain} and \emph{delay cost}, yielding the Communication Gain and Delay Cost (CGDC) metric. We further establish a value-loss bound showing that the degradation induced by delayed messages is upper-bounded by a discounted accumulation of an information gap between the action distributions induced by timely versus delayed messages. Guided by CGDC, we propose \textbf{CDCMA}, an actor--critic framework that requests messages only when predicted CGDC is positive, predicts future observations to reduce misalignment at consumption, and fuses delayed messages via CGDC-guided attention. Experiments on no-teammate-vision variants of Cooperative Navigation and Predator Prey, and on SMAC maps across multiple delay levels show consistent improvements in performance, robustness, and generalization, with ablations validating each component.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Cooperative Navigation | Cooperative Navigation easy | Mean Episode Reward-1.7 | 14 | |
| Cooperative Navigation | multi-agent particle environment medium | Average Return-1.8 | 7 | |
| Cooperative Navigation | Cooperative Navigation hard | Mean Episode Reward-1.78 | 7 | |
| Cooperative Navigation | Cooperative Navigation super_hard | Mean Episode Reward-1.89 | 7 | |
| Multi-agent cooperation | SMAC 1o_2r_vs_4r (easy) | Win Rate81.25 | 7 | |
| Multi-agent cooperation | SMAC 1o_2r_vs_4r medium | Win Rate79.72 | 7 | |
| Multi-agent cooperation | SMAC 1o_2r_vs_4r hard | Win Rate68.81 | 7 | |
| Multi-agent cooperation | SMAC 1o_2r_vs_4r super_hard | Win Rate63.28 | 7 | |
| Multi-agent cooperation | SMAC 1o_10b_vs_1r (easy) | Win Rate81.88 | 7 | |
| Multi-agent cooperation | SMAC 1o_10b_vs_1r medium | Win Rate79.79 | 7 |