Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

What Makes a Good Diffusion Planner for Decision Making?

About

Diffusion models have recently shown significant potential in solving decision-making problems, particularly in generating behavior plans -- also known as diffusion planning. While numerous studies have demonstrated the impressive performance of diffusion planning, the mechanisms behind the key components of a good diffusion planner remain unclear and the design choices are highly inconsistent in existing studies. In this work, we address this issue through systematic empirical experiments on diffusion planning in an offline reinforcement learning (RL) setting, providing practical insights into the essential components of diffusion planning. We trained and evaluated over 6,000 diffusion models, identifying the critical components such as guided sampling, network architecture, action generation and planning strategy. We revealed that some design choices opposite to the common practice in previous work in diffusion planning actually lead to better performance, e.g., unconditional sampling with selection can be better than guided sampling and Transformer outperforms U-Net as denoising network. Based on these insights, we suggest a simple yet strong diffusion planning baseline that achieves state-of-the-art results on standard offline RL benchmarks.

Haofei Lu, Dongqi Han, Yifei Shen, Dongsheng Li• 2025

Related benchmarks

TaskDatasetResultRank
LocomotionD4RL walker2d-medium-expert
Normalized Score108.6
63
LocomotionD4RL HalfCheetah Medium-Replay
Normalized Score0.458
61
LocomotionD4RL Halfcheetah medium
Normalized Score50.4
60
LocomotionD4RL Walker2d medium
Normalized Score82.8
60
LocomotionD4RL halfcheetah-medium-expert
Normalized Score92.7
53
Offline Reinforcement LearningD4RL antmaze-large (diverse)
Normalized Score76
37
Offline Reinforcement LearningD4RL antmaze-large (play)
Normalized Score0.764
36
Offline Reinforcement LearningD4RL Franka Kitchen
Mixed Success Rate73.6
34
Offline Reinforcement LearningD4RL Maze2D
Return (UMaze)136.6
31
LocomotionD4RL Hopper medium
Normalized Score80.9
30
Showing 10 of 26 rows

Other info

Follow for update