Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification

About

Conservatism has led to significant progress in offline reinforcement learning (RL) where an agent learns from pre-collected datasets. However, as many real-world scenarios involve interaction among multiple agents, it is important to resolve offline RL in the multi-agent setting. Given the recent success of transferring online RL algorithms to the multi-agent setting, one may expect that offline RL algorithms will also transfer to multi-agent settings directly. Surprisingly, we empirically observe that conservative offline RL algorithms do not work well in the multi-agent setting -- the performance degrades significantly with an increasing number of agents. Towards mitigating the degradation, we identify a key issue that non-concavity of the value function makes the policy gradient improvements prone to local optima. Multiple agents exacerbate the problem severely, since the suboptimal policy by any agent can lead to uncoordinated global failure. Following this intuition, we propose a simple yet effective method, Offline Multi-Agent RL with Actor Rectification (OMAR), which combines the first-order policy gradients and zeroth-order optimization methods to better optimize the conservative value functions over the actor parameters. Despite the simplicity, OMAR achieves state-of-the-art results in a variety of multi-agent control tasks.

Ling Pan, Longbo Huang, Tengyu Ma, Huazhe Xu• 2021

Related benchmarks

TaskDatasetResultRank
StarCraft II micromanagementStarCraft II 2s3z mixed
Win Rate60
18
StarCraft II micromanagementStarCraft II 2s3z expert
Win Rate95
18
StarCraft II micromanagementStarCraft II 5m_vs_6m medium
Win Rate19
18
StarCraft II micromanagementStarCraft II 2s3z medium_replay
Win Rate24
18
StarCraft II micromanagementStarCraft II 5m_vs_6m mixed
Win Rate10
18
StarCraft II micromanagementStarCraft II 5m_vs_6m medium_replay
Win Rate3
18
StarCraft II micromanagementStarCraft II 6h_vs_8z medium
Test Winning Rate4
18
StarCraft II micromanagementStarCraft II 2s3z medium
Win Rate15
18
StarCraft II micromanagementStarCraft II 3s_vs_5z medium
Win Rate0.00e+0
18
StarCraft II micromanagementStarCraft II 3s_vs_5z expert
Win Rate64
18
Showing 10 of 118 rows
...

Other info

Follow for update