Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Deep Implicit Coordination Graphs for Multi-agent Reinforcement Learning

About

Multi-agent reinforcement learning (MARL) requires coordination to efficiently solve certain tasks. Fully centralized control is often infeasible in such domains due to the size of joint action spaces. Coordination graph based formalization allows reasoning about the joint action based on the structure of interactions. However, they often require domain expertise in their design. This paper introduces the deep implicit coordination graph (DICG) architecture for such scenarios. DICG consists of a module for inferring the dynamic coordination graph structure which is then used by a graph neural network based module to learn to implicitly reason about the joint actions or values. DICG allows learning the tradeoff between full centralization and decentralization via standard actor-critic methods to significantly improve coordination for domains with large number of agents. We apply DICG to both centralized-training-centralized-execution and centralized-training-decentralized-execution regimes. We demonstrate that DICG solves the relative overgeneralization pathology in predatory-prey tasks as well as outperforms various MARL baselines on the challenging StarCraft II Multi-agent Challenge (SMAC) and traffic junction environments.

Sheng Li, Jayesh K. Gupta, Peter Morales, Ross Allen, Mykel J. Kochenderfer• 2020

Related benchmarks

TaskDatasetResultRank
Multi-Agent Reinforcement LearningMPE Speaker-Listener
Return23.2
17
Cooperative Multi-Agent Reinforcement LearningDisperse (last 2% of train)
Mean Episodic Reward-0.36
13
Cooperative Multi-Agent Reinforcement LearningReference (last 2% of train)
Mean Episodic Reward-34.97
13
Cooperative Multi-Agent Reinforcement LearningSpeaker-Listener (last 2% of train)
Mean Episodic Reward-22.45
13
Cooperative Multi-Agent Reinforcement LearningAdversary (last 2% of train)
Mean Episodic Reward36.51
13
Cooperative Multi-Agent Reinforcement LearningCrypto (last 2% of train)
Mean Episodic Reward13.35
13
Multi-Agent Reinforcement LearningMPE Reference (test)
Final Test Return35.1
6
Multi-Agent Reinforcement LearningMPE Adversary (test)
Final Test Return34.3
6
Multi-Agent Reinforcement LearningMPE Push (test)
Final Return10.2
6
Showing 9 of 9 rows

Other info

Follow for update