DIME:Diffusion-Based Maximum Entropy Reinforcement Learning

About

Maximum entropy reinforcement learning (MaxEnt-RL) has become the standard approach to RL due to its beneficial exploration properties. Traditionally, policies are parameterized using Gaussian distributions, which significantly limits their representational capacity. Diffusion-based policies offer a more expressive alternative, yet integrating them into MaxEnt-RL poses challenges-primarily due to the intractability of computing their marginal entropy. To overcome this, we propose Diffusion-Based Maximum Entropy RL (DIME). \emph{DIME} leverages recent advances in approximate inference with diffusion models to derive a lower bound on the maximum entropy objective. Additionally, we propose a policy iteration scheme that provably converges to the optimal diffusion policy. Our method enables the use of expressive diffusion-based policies while retaining the principled exploration benefits of MaxEnt-RL, significantly outperforming other diffusion-based methods on challenging high-dimensional control benchmarks. It is also competitive with state-of-the-art non-diffusion based RL methods while requiring fewer algorithmic design choices and smaller update-to-data ratios, reducing computational complexity.

Onur Celik, Zechu Li, Denis Blessing, Ge Li, Daniel Palenicek, Jan Peters, Georgia Chalvatzaki, Gerhard Neumann• 2025

Related benchmarks

Task	Dataset	Result
Continuous Control	MuJoCo Walker2d v4	--	51
Continuous Control	MuJoCo Ant v4	Average Return7.10e+3	46
Continuous Control	MuJoCo HalfCheetah v4	Average Return1.35e+4	36
Continuous Control	MuJoCo Swimmer v4	Total Reward118.8	19
Continuous Control	Ant v4	Average Return7.10e+3	15
Continuous Control	DMC Dog	Dog Stand IQM96.8	7
Musculoskeletal control	MyoSuite	Reach Hard IQM90	7

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord