Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Model-based Offline Reinforcement Learning with Count-based Conservatism

About

In this paper, we propose a model-based offline reinforcement learning method that integrates count-based conservatism, named $\texttt{Count-MORL}$. Our method utilizes the count estimates of state-action pairs to quantify model estimation error, marking the first algorithm of demonstrating the efficacy of count-based conservatism in model-based offline deep RL to the best of our knowledge. For our proposed method, we first show that the estimation error is inversely proportional to the frequency of state-action pairs. Secondly, we demonstrate that the learned policy under the count-based conservative model offers near-optimality performance guarantees. Through extensive numerical experiments, we validate that $\texttt{Count-MORL}$ with hash code implementation significantly outperforms existing offline RL algorithms on the D4RL benchmark datasets. The code is accessible at $\href{https://github.com/oh-lab/Count-MORL}{https://github.com/oh-lab/Count-MORL}$.

Byeongchan Kim, Min-hwan Oh• 2023

Related benchmarks

TaskDatasetResultRank
Offline Reinforcement LearningD4RL halfcheetah-medium-expert
Normalized Score100
155
Offline Reinforcement LearningD4RL hopper-medium-expert
Normalized Score111.4
153
Offline Reinforcement LearningD4RL walker2d-medium-expert
Normalized Score112.3
124
Offline Reinforcement LearningD4RL Medium HalfCheetah
Normalized Score76.5
97
Offline Reinforcement LearningD4RL Medium-Replay Hopper
Normalized Score101.7
97
Offline Reinforcement LearningD4RL Medium Walker2d
Normalized Score87.6
96
Offline Reinforcement LearningD4RL walker2d-random
Normalized Score21.9
93
Offline Reinforcement LearningD4RL halfcheetah-random
Normalized Score41
86
Offline Reinforcement LearningD4RL Medium-Replay HalfCheetah
Normalized Score71.5
84
Offline Reinforcement LearningD4RL hopper-random
Normalized Score30.7
78
Showing 10 of 22 rows

Other info

Follow for update