Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling in Offline Reinforcement Learning

About

Guided sampling is a vital approach for applying diffusion models in real-world tasks that embeds human-defined guidance during the sampling procedure. This paper considers a general setting where the guidance is defined by an (unnormalized) energy function. The main challenge for this setting is that the intermediate guidance during the diffusion sampling procedure, which is jointly defined by the sampling distribution and the energy function, is unknown and is hard to estimate. To address this challenge, we propose an exact formulation of the intermediate guidance as well as a novel training objective named contrastive energy prediction (CEP) to learn the exact guidance. Our method is guaranteed to converge to the exact guidance under unlimited model capacity and data samples, while previous methods can not. We demonstrate the effectiveness of our method by applying it to offline reinforcement learning (RL). Extensive experiments on D4RL benchmarks demonstrate that our method outperforms existing state-of-the-art algorithms. We also provide some examples of applying CEP for image synthesis to demonstrate the scalability of CEP on high-dimensional data.

Cheng Lu, Huayu Chen, Jianfei Chen, Hang Su, Chongxuan Li, Jun Zhu• 2023

Related benchmarks

Task	Dataset	Result
Offline Reinforcement Learning	D4RL halfcheetah-medium-expert	--	169
Offline Reinforcement Learning	D4RL hopper-medium-expert	--	161
Offline Reinforcement Learning	D4RL MuJoCo halfcheetah-medium-expert	Normalized Score93.5	54
Offline Reinforcement Learning	D4RL MuJoCo hopper-medium-expert	Normalized Score108	47
Offline Reinforcement Learning	D4RL MuJoCo walker2d-medium-expert	Normalized Score110.7	47
Offline Reinforcement Learning	D4RL MuJoCo halfcheetah-medium-replay	Normalized Score0.476	47
Offline Reinforcement Learning	D4RL MuJoCo hopper-medium-replay	Normalized Score96.9	42
Offline Reinforcement Learning	D4RL Hopper medium	Reward98	35
Offline Reinforcement Learning	D4RL MuJoCo walker2d-medium	Normalized Score86	33
Offline Reinforcement Learning	D4RL MuJoCo halfcheetah-medium	Normalized Score54.1	33

Showing 10 of 24 rows

Other info

Follow for update

@wizwand_team Discord