Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DIME:Diffusion-Based Maximum Entropy Reinforcement Learning

About

Maximum entropy reinforcement learning (MaxEnt-RL) has become the standard approach to RL due to its beneficial exploration properties. Traditionally, policies are parameterized using Gaussian distributions, which significantly limits their representational capacity. Diffusion-based policies offer a more expressive alternative, yet integrating them into MaxEnt-RL poses challenges-primarily due to the intractability of computing their marginal entropy. To overcome this, we propose Diffusion-Based Maximum Entropy RL (DIME). \emph{DIME} leverages recent advances in approximate inference with diffusion models to derive a lower bound on the maximum entropy objective. Additionally, we propose a policy iteration scheme that provably converges to the optimal diffusion policy. Our method enables the use of expressive diffusion-based policies while retaining the principled exploration benefits of MaxEnt-RL, significantly outperforming other diffusion-based methods on challenging high-dimensional control benchmarks. It is also competitive with state-of-the-art non-diffusion based RL methods while requiring fewer algorithmic design choices and smaller update-to-data ratios, reducing computational complexity.

Onur Celik, Zechu Li, Denis Blessing, Ge Li, Daniel Palenicek, Jan Peters, Georgia Chalvatzaki, Gerhard Neumann• 2025

Related benchmarks

TaskDatasetResultRank
Continuous ControlMuJoCo Ant v4
Average Return7.10e+3
46
Continuous ControlMuJoCo Walker2d v4--
39
Continuous ControlMuJoCo HalfCheetah v4
Average Return1.35e+4
36
Continuous ControlMuJoCo Swimmer v4
Total Reward118.8
19
Continuous ControlAnt v4
Average Return7.10e+3
15
Continuous ControlDMC Dog
Dog Stand IQM96.8
7
Musculoskeletal controlMyoSuite
Reach Hard IQM90
7
Showing 7 of 7 rows

Other info

Follow for update