Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Counterfactual Multi-Agent Policy Gradients

About

Cooperative multi-agent systems can be naturally used to model many real world problems, such as network packet routing and the coordination of autonomous vehicles. There is a great need for new reinforcement learning methods that can efficiently learn decentralised policies for such systems. To this end, we propose a new multi-agent actor-critic method called counterfactual multi-agent (COMA) policy gradients. COMA uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies. In addition, to address the challenges of multi-agent credit assignment, it uses a counterfactual baseline that marginalises out a single agent's action, while keeping the other agents' actions fixed. COMA also uses a critic representation that allows the counterfactual baseline to be computed efficiently in a single forward pass. We evaluate COMA in the testbed of StarCraft unit micromanagement, using a decentralised variant with significant partial observability. COMA significantly improves average performance over other multi-agent actor-critic methods in this setting, and the best performing agents are competitive with state-of-the-art centralised controllers that get access to the full state.

Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, Shimon Whiteson• 2017

Related benchmarks

TaskDatasetResultRank
Multi-Agent Reinforcement LearningLevel-Based Foraging 10x10-3p-5f v2 (test)
Final Episode Return12
10
Multi-Agent Reinforcement LearningLevel-Based Foraging 2s-10x10-3p-3f v2 (test)
Final Episode Return20
10
Multi-Agent Reinforcement LearningSMAC 1c3s5z (test)
Test Win Rate31
10
Multi-Agent Reinforcement LearningLevel-Based Foraging 2s-8x8-2p-2f-coop v2 (test)
Final Episode Return6
10
Multi-Agent Reinforcement LearningLevel-Based Foraging 10x10-4p-3f v2 (test)
Final Episode Return4
10
Multi-agent unit micromanagementStarCraft scenario 5w unit micromanagement benchmark (test)
Mean Win Percentage82
9
Multi-agent unit micromanagementStarCraft unit micromanagement benchmark scenario 5m (final 1000 evaluation episodes)
Mean Win Rate81
9
Multi-agent unit micromanagementStarCraft 2d_3z (final 1000 evaluation)
Mean Win Rate47
9
Multi-Agent Reinforcement LearningSMAC 3s5z (test)
Test Win Rate1
8
Multi-agent unit micromanagementStarCraft scenario 3m (evaluation)
Mean Win Percentage87
7
Showing 10 of 22 rows

Other info

Follow for update