Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Diffusion Guidance Is a Controllable Policy Improvement Operator

About

At the core of reinforcement learning is the idea of learning beyond the performance in the data. However, scaling such systems has proven notoriously tricky. In contrast, techniques from generative modeling have proven remarkably scalable and are simple to train. In this work, we combine these strengths, by deriving a direct relation between policy improvement and guidance of diffusion models. The resulting framework, CFGRL, is trained with the simplicity of supervised learning, yet can further improve on the policies in the data. On offline RL tasks, we observe a reliable trend -- increased guidance weighting leads to increased performance. Of particular importance, CFGRL can operate without explicitly learning a value function, allowing us to generalize simple supervised methods (e.g., goal-conditioned behavioral cloning) to further prioritize optimality, gaining performance for "free" across the board.

Kevin Frans, Seohong Park, Pieter Abbeel, Sergey Levine• 2025

Related benchmarks

TaskDatasetResultRank
Robot goal-reaching success rate evaluationOGBench cube-single-play-singletask
Success Rate14
13
Robot goal-reaching success rate evaluationOGBench cube-double-play-singletask
Success Rate (%)3
13
Robot goal-reaching success rate evaluationOGBench scene-play-sparse-singletask
Success Rate42
13
Robot goal-reaching success rate evaluationOGBench puzzle-4x4-play-sparse-singletask
Success Rate1
13
Robot goal-reaching success rate evaluationOGBench puzzle-3x3-play-sparse-singletask
Success Rate2
13
Robot goal-reaching success rate evaluationOGBench visual-*-task1
Success Rate46
5
Showing 6 of 6 rows

Other info

Follow for update