Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

About

We explore deep reinforcement learning methods for multi-agent domains. We begin by analyzing the difficulty of traditional algorithms in the multi-agent case: Q-learning is challenged by an inherent non-stationarity of the environment, while policy gradient suffers from a variance that increases as the number of agents grows. We then present an adaptation of actor-critic methods that considers action policies of other agents and is able to successfully learn policies that require complex multi-agent coordination. Additionally, we introduce a training regimen utilizing an ensemble of policies for each agent that leads to more robust multi-agent policies. We show the strength of our approach compared to existing methods in cooperative as well as competitive scenarios, where agent populations are able to discover various physical and informational coordination strategies.

Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, Igor Mordatch• 2017

Related benchmarks

TaskDatasetResultRank
Multi-agent cooperationSimple_Tag 6 agents
Average Reward138.5
42
Multi-Agent Reinforcement LearningTREA rdist
Mean Episodic Reward3.24e+4
42
Multi-agent cooperationSimple_Tag 9 agents
Avg Reward83.4
42
Multi-agent cooperationSimple_Tag 3 agents
Average Reward75.9
42
Multi-Agent Reinforcement LearningREF rdete
Mean Episodic Reward-44
21
Multi-Agent Reinforcement LearningREF rdist
Mean Episodic Reward-43
21
Multi-Agent Reinforcement LearningTREA rdete
Mean Episodic Reward-428
21
Multi-Agent Reinforcement LearningCN rac-dist
Mean Episodic Reward802
21
Multi-Agent Reinforcement LearningCN rdete
Mean Episodic Reward-169
21
Multi-Agent Reinforcement LearningCN rdist
Mean Episodic Reward-213
21
Showing 10 of 48 rows

Other info

Code

Follow for update