Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Counterfactual Conservative Q Learning for Offline Multi-agent Reinforcement Learning

About

Offline multi-agent reinforcement learning is challenging due to the coupling effect of both distribution shift issue common in offline setting and the high dimension issue common in multi-agent setting, making the action out-of-distribution (OOD) and value overestimation phenomenon excessively severe. Tomitigate this problem, we propose a novel multi-agent offline RL algorithm, named CounterFactual Conservative Q-Learning (CFCQL) to conduct conservative value estimation. Rather than regarding all the agents as a high dimensional single one and directly applying single agent methods to it, CFCQL calculates conservative regularization for each agent separately in a counterfactual way and then linearly combines them to realize an overall conservative value estimation. We prove that it still enjoys the underestimation property and the performance guarantee as those single agent conservative methods do, but the induced regularization and safe policy improvement bound are independent of the agent number, which is therefore theoretically superior to the direct treatment referred to above, especially when the agent number is large. We further conduct experiments on four environments including both discrete and continuous action settings on both existing and our man-made datasets, demonstrating that CFCQL outperforms existing methods on most datasets and even with a remarkable margin on some of them.

Jianzhun Shao, Yun Qu, Chen Chen, Hongchang Zhang, Xiangyang Ji• 2023

Related benchmarks

TaskDatasetResultRank
StarCraft II micromanagementStarCraft II 2s3z expert
Win Rate99
24
StarCraft II micromanagementStarCraft II 5m_vs_6m medium
Win Rate29
24
StarCraft II micromanagementStarCraft II 2s3z medium_replay
Win Rate55
24
StarCraft II micromanagementStarCraft II 5m_vs_6m medium_replay
Win Rate22
24
StarCraft II micromanagementStarCraft II 2s3z medium
Win Rate40
24
StarCraft II micromanagementStarCraft II 6h_vs_8z medium
Test Winning Rate41
24
Multi-Agent Reinforcement LearningMPE Cooperative Navigation (CN) v1 (Expert)
Normalized Score112
19
StarCraft II micromanagementStarCraft II 3s_vs_5z expert
Win Rate99
18
StarCraft II micromanagementStarCraft II 3s_vs_5z mixed
Win Rate60
18
StarCraft II micromanagementStarCraft II 3s_vs_5z medium
Win Rate28
18
Showing 10 of 74 rows
...

Other info

Code

Follow for update