Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Counterfactual Multi-Agent Policy Gradients

About

Cooperative multi-agent systems can be naturally used to model many real world problems, such as network packet routing and the coordination of autonomous vehicles. There is a great need for new reinforcement learning methods that can efficiently learn decentralised policies for such systems. To this end, we propose a new multi-agent actor-critic method called counterfactual multi-agent (COMA) policy gradients. COMA uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies. In addition, to address the challenges of multi-agent credit assignment, it uses a counterfactual baseline that marginalises out a single agent's action, while keeping the other agents' actions fixed. COMA also uses a critic representation that allows the counterfactual baseline to be computed efficiently in a single forward pass. We evaluate COMA in the testbed of StarCraft unit micromanagement, using a decentralised variant with significant partial observability. COMA significantly improves average performance over other multi-agent actor-critic methods in this setting, and the best performing agents are competitive with state-of-the-art centralised controllers that get access to the full state.

Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, Shimon Whiteson• 2017

Related benchmarks

TaskDatasetResultRank
5m vs 6mSMAC
Win Rate0.3
13
6h vs 8zSMAC
Win Rate80.4
12
10m vs 11mSMAC
Win Rate1.3
12
3s5z vs 3s6zSMAC
Win Rate0.00e+0
12
Multi-Agent Reinforcement LearningLevel-Based Foraging 10x10-3p-5f v2 (test)
Final Episode Return12
10
Multi-Agent Reinforcement LearningLevel-Based Foraging 2s-10x10-3p-3f v2 (test)
Final Episode Return20
10
Multi-Agent Reinforcement LearningSMAC 1c3s5z (test)
Test Win Rate31
10
Multi-Agent Reinforcement LearningLevel-Based Foraging 2s-8x8-2p-2f-coop v2 (test)
Final Episode Return6
10
Multi-Agent Reinforcement LearningLevel-Based Foraging 10x10-4p-3f v2 (test)
Final Episode Return4
10
Multi-agent unit micromanagementStarCraft scenario 5w unit micromanagement benchmark (test)
Mean Win Percentage82
9
Showing 10 of 34 rows

Other info

Follow for update