Consistency Models as a Rich and Efficient Policy Class for Reinforcement Learning

About

Score-based generative models like the diffusion model have been testified to be effective in modeling multi-modal data from image generation to reinforcement learning (RL). However, the inference process of diffusion model can be slow, which hinders its usage in RL with iterative sampling. We propose to apply the consistency model as an efficient yet expressive policy representation, namely consistency policy, with an actor-critic style algorithm for three typical RL settings: offline, offline-to-online and online. For offline RL, we demonstrate the expressiveness of generative models as policies from multi-modal data. For offline-to-online RL, the consistency policy is shown to be more computational efficient than diffusion policy, with a comparable performance. For online RL, the consistency policy demonstrates significant speedup and even higher average performances than the diffusion policy.

Zihan Ding, Chi Jin• 2023

Related benchmarks

Task	Dataset	Result
Offline Reinforcement Learning	D4RL halfcheetah-medium-expert	Normalized Score84.3	169
Offline Reinforcement Learning	D4RL hopper-medium-expert	Normalized Score100.4	161
Offline Reinforcement Learning	D4RL walker2d-medium-expert	Normalized Score110.4	132
Offline Reinforcement Learning	D4RL Medium HalfCheetah	Normalized Score69.1	105
Offline Reinforcement Learning	D4RL Medium Walker2d	Normalized Score83.1	104
Offline Reinforcement Learning	D4RL antmaze-umaze (diverse)	Normalized Score77.6	74
Offline Reinforcement Learning	D4RL Medium Hopper	Normalized Score80.7	72
Offline Reinforcement Learning	D4RL AntMaze	AntMaze Umaze Return66	65
Offline Reinforcement Learning	OGBench	AntMaze Giant Navigate0.00e+0	56
Offline Reinforcement Learning	D4RL Adroit pen (human)	Normalized Return64	53

Showing 10 of 83 rows

...

Other info

Follow for update

@wizwand_team Discord