Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Planning with Diffusion for Flexible Behavior Synthesis

About

Model-based reinforcement learning methods often use learning only for the purpose of estimating an approximate dynamics model, offloading the rest of the decision-making work to classical trajectory optimizers. While conceptually simple, this combination has a number of empirical shortcomings, suggesting that learned models may not be well-suited to standard trajectory optimization. In this paper, we consider what it would look like to fold as much of the trajectory optimization pipeline as possible into the modeling problem, such that sampling from the model and planning with it become nearly identical. The core of our technical approach lies in a diffusion probabilistic model that plans by iteratively denoising trajectories. We show how classifier-guided sampling and image inpainting can be reinterpreted as coherent planning strategies, explore the unusual and useful properties of diffusion-based planning methods, and demonstrate the effectiveness of our framework in control settings that emphasize long-horizon decision-making and test-time flexibility.

Michael Janner, Yilun Du, Joshua B. Tenenbaum, Sergey Levine• 2022

Related benchmarks

TaskDatasetResultRank
Offline Reinforcement LearningD4RL halfcheetah-medium-expert
Normalized Score88.9
117
Offline Reinforcement LearningD4RL hopper-medium-expert
Normalized Score107.2
115
Offline Reinforcement LearningD4RL walker2d-medium-expert
Normalized Score108.4
86
Offline Reinforcement LearningD4RL Medium-Replay Hopper
Normalized Score96.8
72
Offline Reinforcement LearningD4RL Medium-Replay HalfCheetah
Normalized Score42.2
59
Offline Reinforcement LearningD4RL Medium HalfCheetah
Normalized Score44.2
59
Offline Reinforcement LearningD4RL Medium Walker2d
Normalized Score79.7
58
Offline Reinforcement LearningD4RL halfcheetah v2 (medium-replay)
Normalized Score37.7
58
hopper locomotionD4RL hopper medium-replay
Normalized Score96.8
56
walker2d locomotionD4RL walker2d medium-replay
Normalized Score70.6
53
Showing 10 of 123 rows
...

Other info

Follow for update