Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Decomposing Communication Gain and Delay Cost Under Cross-Timestep Delays in Cooperative Multi-Agent Reinforcement Learning

About

Communication is essential for coordination in \emph{cooperative} multi-agent reinforcement learning under partial observability, yet \emph{cross-timestep} delays cause messages to arrive multiple timesteps after generation, inducing temporal misalignment and making information stale when consumed. We formalize this setting as a delayed-communication partially observable Markov game (DeComm-POMG) and decompose a message's effect into \emph{communication gain} and \emph{delay cost}, yielding the Communication Gain and Delay Cost (CGDC) metric. We further establish a value-loss bound showing that the degradation induced by delayed messages is upper-bounded by a discounted accumulation of an information gap between the action distributions induced by timely versus delayed messages. Guided by CGDC, we propose \textbf{CDCMA}, an actor--critic framework that requests messages only when predicted CGDC is positive, predicts future observations to reduce misalignment at consumption, and fuses delayed messages via CGDC-guided attention. Experiments on no-teammate-vision variants of Cooperative Navigation and Predator Prey, and on SMAC maps across multiple delay levels show consistent improvements in performance, robustness, and generalization, with ablations validating each component.

Zihong Gao, Hongjian Liang, Lei Hao, Liangjun Ke• 2026

Related benchmarks

TaskDatasetResultRank
Cooperative NavigationCooperative Navigation easy
Mean Episode Reward-1.7
14
Cooperative Navigationmulti-agent particle environment medium
Average Return-1.8
7
Cooperative NavigationCooperative Navigation hard
Mean Episode Reward-1.78
7
Cooperative NavigationCooperative Navigation super_hard
Mean Episode Reward-1.89
7
Multi-agent cooperationSMAC 1o_2r_vs_4r (easy)
Win Rate81.25
7
Multi-agent cooperationSMAC 1o_2r_vs_4r medium
Win Rate79.72
7
Multi-agent cooperationSMAC 1o_2r_vs_4r hard
Win Rate68.81
7
Multi-agent cooperationSMAC 1o_2r_vs_4r super_hard
Win Rate63.28
7
Multi-agent cooperationSMAC 1o_10b_vs_1r (easy)
Win Rate81.88
7
Multi-agent cooperationSMAC 1o_10b_vs_1r medium
Win Rate79.79
7
Showing 10 of 19 rows

Other info

Follow for update