Planning with Diffusion for Flexible Behavior Synthesis

About

Model-based reinforcement learning methods often use learning only for the purpose of estimating an approximate dynamics model, offloading the rest of the decision-making work to classical trajectory optimizers. While conceptually simple, this combination has a number of empirical shortcomings, suggesting that learned models may not be well-suited to standard trajectory optimization. In this paper, we consider what it would look like to fold as much of the trajectory optimization pipeline as possible into the modeling problem, such that sampling from the model and planning with it become nearly identical. The core of our technical approach lies in a diffusion probabilistic model that plans by iteratively denoising trajectories. We show how classifier-guided sampling and image inpainting can be reinterpreted as coherent planning strategies, explore the unusual and useful properties of diffusion-based planning methods, and demonstrate the effectiveness of our framework in control settings that emphasize long-horizon decision-making and test-time flexibility.

Michael Janner, Yilun Du, Joshua B. Tenenbaum, Sergey Levine• 2022

Related benchmarks

Task	Dataset	Result
Offline Reinforcement Learning	D4RL halfcheetah-medium-expert	Normalized Score88.9	169
Offline Reinforcement Learning	D4RL hopper-medium-expert	Normalized Score107.2	161
Offline Reinforcement Learning	D4RL walker2d-medium-expert	Normalized Score108.4	140
Offline Reinforcement Learning	D4RL Medium-Replay Hopper	Normalized Score96.8	109
Offline Reinforcement Learning	D4RL Medium HalfCheetah	Normalized Score44.2	105
Offline Reinforcement Learning under Gravity Shift	MuJoCo HalfCheetah	Normalized Return48.45	104
Offline Reinforcement Learning under Gravity Shift	MuJoCo Hopper	Normalized Return34.92	104
Offline Reinforcement Learning under Gravity Shift	MuJoCo Ant	Normalized Return35.84	104
Offline Reinforcement Learning	D4RL Medium Walker2d	Normalized Score79.7	104
Offline Reinforcement Learning	MuJoCo HalfCheetah	Normalized Return61.98	97

Showing 10 of 316 rows

...

Other info

Follow for update

@wizwand_team Discord