Multi-Agent Reinforcement Learning with Communication-Constrained Priors
About
Communication is one of the effective means to improve the learning of cooperative policy in multi-agent systems. However, in most real-world scenarios, lossy communication is a prevalent issue. Existing multi-agent reinforcement learning with communication, due to their limited scalability and robustness, struggles to apply to complex and dynamic real-world environments. To address these challenges, we propose a generalized communication-constrained model to uniformly characterize communication conditions across different scenarios. Based on this, we utilize it as a learning prior to distinguish between lossy and lossless messages for specific scenarios. Additionally, we decouple the impact of lossy and lossless messages on distributed decision-making, drawing on a dual mutual information estimatior, and introduce a communication-constrained multi-agent reinforcement learning framework, quantifying the impact of communication messages into the global reward. Finally, we validate the effectiveness of our approach across several communication-constrained benchmarks.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multi-agent cooperation | Simple_Tag 3 agents | Average Reward138 | 42 | |
| Multi-agent cooperation | Simple_Tag 9 agents | Avg Reward83.7 | 42 | |
| Multi-agent cooperation | Simple_Tag 6 agents | Average Reward135.3 | 42 | |
| Multi-Agent Reinforcement Learning | Simple_Tag Unrestricted MPE (test) | Cumulative Reward134.7 | 7 | |
| Multi-Agent Reinforcement Learning | Simple_Tag Light MBC (3) MPE (test) | Avg Episode Reward133.6 | 7 | |
| Multi-Agent Reinforcement Learning | Simple_Tag Medium MBC (6) | Avg Episode Reward134.9 | 7 | |
| Multi-Agent Reinforcement Learning | Simple_Tag Heavy MBC (8) | Avg Episode Reward131.4 | 7 | |
| Multi-Agent Reinforcement Learning | Simple_Tag Light DBC (5) | Cumulative Reward136.9 | 7 | |
| Multi-Agent Reinforcement Learning | Simple_Tag Medium DBC (3) | Cumulative Reward135.3 | 7 | |
| Multi-Agent Reinforcement Learning | Simple Tag Heavy DBC | Avg Episode Reward138 | 7 |