Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Consistency Models as a Rich and Efficient Policy Class for Reinforcement Learning

About

Score-based generative models like the diffusion model have been testified to be effective in modeling multi-modal data from image generation to reinforcement learning (RL). However, the inference process of diffusion model can be slow, which hinders its usage in RL with iterative sampling. We propose to apply the consistency model as an efficient yet expressive policy representation, namely consistency policy, with an actor-critic style algorithm for three typical RL settings: offline, offline-to-online and online. For offline RL, we demonstrate the expressiveness of generative models as policies from multi-modal data. For offline-to-online RL, the consistency policy is shown to be more computational efficient than diffusion policy, with a comparable performance. For online RL, the consistency policy demonstrates significant speedup and even higher average performances than the diffusion policy.

Zihan Ding, Chi Jin• 2023

Related benchmarks

TaskDatasetResultRank
Offline Reinforcement LearningD4RL halfcheetah-medium-expert
Normalized Score84.3
155
Offline Reinforcement LearningD4RL hopper-medium-expert
Normalized Score100.4
153
Offline Reinforcement LearningD4RL walker2d-medium-expert
Normalized Score110.4
124
Offline Reinforcement LearningD4RL Medium HalfCheetah
Normalized Score69.1
97
Offline Reinforcement LearningD4RL Medium Walker2d
Normalized Score83.1
96
Offline Reinforcement LearningD4RL AntMaze
AntMaze Umaze Return66
65
Offline Reinforcement LearningD4RL Medium Hopper
Normalized Score80.7
64
Offline Reinforcement LearningD4RL antmaze-umaze (diverse)
Normalized Score77.6
47
Offline Reinforcement LearningD4RL Adroit pen (human)
Normalized Return64
39
Offline Reinforcement LearningD4RL Adroit pen (cloned)
Normalized Return56
39
Showing 10 of 41 rows

Other info

Follow for update