Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MACCA: Offline Multi-agent Reinforcement Learning with Causal Credit Assignment

About

Offline Multi-agent Reinforcement Learning (MARL) is valuable in scenarios where online interaction is impractical or risky. While independent learning in MARL offers flexibility and scalability, accurately assigning credit to individual agents in offline settings poses challenges because interactions with an environment are prohibited. In this paper, we propose a new framework, namely Multi-Agent Causal Credit Assignment (MACCA), to address credit assignment in the offline MARL setting. Our approach, MACCA, characterizing the generative process as a Dynamic Bayesian Network, captures relationships between environmental variables, states, actions, and rewards. Estimating this model on offline data, MACCA can learn each agent's contribution by analyzing the causal relationship of their individual rewards, ensuring accurate and interpretable credit assignment. Additionally, the modularity of our approach allows it to integrate with various offline MARL methods seamlessly. Theoretically, we proved that under the setting of the offline dataset, the underlying causal structure and the function for generating the individual rewards of agents are identifiable, which laid the foundation for the correctness of our modeling. In our experiments, we demonstrate that MACCA not only outperforms state-of-the-art methods but also enhances performance when integrated with other backbones.

Ziyan Wang, Yali Du, Yudi Zhang, Meng Fang, Biwei Huang• 2023

Related benchmarks

TaskDatasetResultRank
StarCraft II micromanagementStarCraft II 2s3z medium
Win Rate55
24
StarCraft II micromanagementStarCraft II 5m_vs_6m medium_replay
Win Rate28
24
StarCraft II micromanagementStarCraft II 2s3z medium_replay
Win Rate59
24
StarCraft II micromanagementStarCraft II 2s3z expert
Win Rate99
24
StarCraft II micromanagementStarCraft II 6h_vs_8z medium
Test Winning Rate22
24
StarCraft II micromanagementStarCraft II 5m_vs_6m medium
Win Rate20
24
Multi-Agent Reinforcement LearningMPE Cooperative Navigation (CN) v1 (Expert)
Normalized Score111.7
19
StarCraft II micromanagementStarCraft II 5m_vs_6m expert
Win Rate88
14
StarCraft II micromanagementStarCraft II 6h_vs_8z medium_replay
Win Rate25
14
StarCraft II micromanagementStarCraft II 6h_vs_8z expert
Win Rate75
14
Showing 10 of 23 rows

Other info

Follow for update