Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning

About

Diffusion Policies have become widely used in Imitation Learning, offering several appealing properties, such as generating multimodal and discontinuous behavior. As models are becoming larger to capture more complex capabilities, their computational demands increase, as shown by recent scaling laws. Therefore, continuing with the current architectures will present a computational roadblock. To address this gap, we propose Mixture-of-Denoising Experts (MoDE) as a novel policy for Imitation Learning. MoDE surpasses current state-of-the-art Transformer-based Diffusion Policies while enabling parameter-efficient scaling through sparse experts and noise-conditioned routing, reducing both active parameters by 40% and inference costs by 90% via expert caching. Our architecture combines this efficient scaling with noise-conditioned self-attention mechanism, enabling more effective denoising across different noise levels. MoDE achieves state-of-the-art performance on 134 tasks in four established imitation learning benchmarks (CALVIN and LIBERO). Notably, by pretraining MoDE on diverse robotics data, we achieve 4.01 on CALVIN ABC and 0.95 on LIBERO-90. It surpasses both CNN-based and Transformer Diffusion Policies by an average of 57% across 4 benchmarks, while using 90% fewer FLOPs and fewer active parameters compared to default Diffusion Transformer architectures. Furthermore, we conduct comprehensive ablations on MoDE's components, providing insights for designing efficient and scalable Transformer architectures for Diffusion Policies. Code and demonstrations are available at https://mbreuss.github.io/MoDE_Diffusion_Policy/.

Moritz Reuss, Jyothish Pari, Pulkit Agrawal, Rudolf Lioutikov• 2024

Related benchmarks

Task	Dataset	Result
Robot Manipulation	LIBERO	--	957
Robotic Manipulation	Calvin ABCD→D	Avg Length3.92	130
Long-horizon robot manipulation	Calvin ABCD→D	Task 1 Completion Rate97.1	127
Robot Manipulation	Calvin ABC->D	Average Successful Length4.01	62
Robotic Manipulation	LIBERO-10	Success Rate94	54
Robotic Manipulation	RLBench (test)	Average Success Rate45	49
Long-horizon robotic manipulation	Calvin ABC->D	Average Trajectory Length3.39	40
Continual Imitation Learning	LIBERO goal 1.0	FWT (Forward Transfer)71	34
Continual Imitation Learning	LIBERO long 1.0	Forward Transfer (FWT)71.6	23
Robot Manipulation	LIBERO-10	Success Rate51.9	23

Showing 10 of 24 rows

Other info

Code

Follow for update

@wizwand_team Discord