Deep Coordination Graphs
About
This paper introduces the deep coordination graph (DCG) for collaborative multi-agent reinforcement learning. DCG strikes a flexible trade-off between representational capacity and generalization by factoring the joint value function of all agents according to a coordination graph into payoffs between pairs of agents. The value can be maximized by local message passing along the graph, which allows training of the value function end-to-end with Q-learning. Payoff functions are approximated with deep neural networks that employ parameter sharing and low-rank approximations to significantly improve sample efficiency. We show that DCG can solve predator-prey tasks that highlight the relative overgeneralization pathology, as well as challenging StarCraft II micromanagement tasks.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multi-Agent Reinforcement Learning | Simple Spread N=6 | Collisions0.1425 | 23 | |
| Multi-Agent Reinforcement Learning | Simple Spread N=3 | Collisions0.0322 | 23 | |
| Multi-Agent Reinforcement Learning | Simple Spread N=4 | Collisions7.56 | 23 | |
| Multi-Agent Reinforcement Learning | MPE Speaker-Listener | Return21.2 | 17 | |
| Cooperative Multi-Agent Reinforcement Learning | Crypto (last 2% of train) | Mean Episodic Reward50 | 13 | |
| Cooperative Multi-Agent Reinforcement Learning | Reference (last 2% of train) | Mean Episodic Reward-27.34 | 13 | |
| Cooperative Multi-Agent Reinforcement Learning | Disperse (last 2% of train) | Mean Episodic Reward-1.16 | 13 | |
| Cooperative Multi-Agent Reinforcement Learning | Speaker-Listener (last 2% of train) | Mean Episodic Reward-21.19 | 13 | |
| Cooperative Multi-Agent Reinforcement Learning | Adversary (last 2% of train) | Mean Episodic Reward40.35 | 13 | |
| Multi-Agent Reinforcement Learning | MPE Adversary (test) | Final Test Return38.8 | 6 |