MADiff: Offline Multi-agent Learning with Diffusion Models

About

Offline reinforcement learning (RL) aims to learn policies from pre-existing datasets without further interactions, making it a challenging task. Q-learning algorithms struggle with extrapolation errors in offline settings, while supervised learning methods are constrained by model expressiveness. Recently, diffusion models (DMs) have shown promise in overcoming these limitations in single-agent learning, but their application in multi-agent scenarios remains unclear. Generating trajectories for each agent with independent DMs may impede coordination, while concatenating all agents' information can lead to low sample efficiency. Accordingly, we propose MADiff, which is realized with an attention-based diffusion model to model the complex coordination among behaviors of multiple agents. To our knowledge, MADiff is the first diffusion-based multi-agent learning framework, functioning as both a decentralized policy and a centralized controller. During decentralized executions, MADiff simultaneously performs teammate modeling, and the centralized controller can also be applied in multi-agent trajectory predictions. Our experiments demonstrate that MADiff outperforms baseline algorithms across various multi-agent learning tasks, highlighting its effectiveness in modeling complex multi-agent interactions. Our code is available at https://github.com/zbzhu99/madiff.

Zhengbang Zhu, Minghuan Liu, Liyuan Mao, Bingyi Kang, Minkai Xu, Yong Yu, Stefano Ermon, Weinan Zhang• 2023

Related benchmarks

Task	Dataset	Result
Multi-agent Trajectory Prediction	NBA dataset	ADE7.92	26
Multi-agent Trajectory Prediction	Football Trajectory Dataset (test)	JADE0.58	20
Multi-Agent Reinforcement Learning	MPE Cooperative Navigation (CN) v1 (Expert)	Normalized Score95	19
Multi-agent continuous control	MA-MuJoCo 6Halfcheetah-Medium	Average Performance4.41e+3	16
Trajectory Prediction	NBA SportVU 2015-2016 season (test)	minADE@1.0s0.25	15
Multi-agent Navigation	Empty Map (test)	Success Rate55	12
Multi-agent Navigation	Obstacle Map (test)	Average Success Rate18	12
Multi-agent Navigation	Barrier Map (test)	Average Success Rate37	12
2halfcheetah	MA Mujoco 2halfcheetah offline (Good)	Average Score8.51e+3	10
Offline Reinforcement Learning	Gaussian Squeeze N=1000 (Expert)	Normalized Return70.2	10

Showing 10 of 64 rows

Other info

Follow for update

@wizwand_team Discord