Model-based Offline Reinforcement Learning with Count-based Conservatism
About
In this paper, we propose a model-based offline reinforcement learning method that integrates count-based conservatism, named $\texttt{Count-MORL}$. Our method utilizes the count estimates of state-action pairs to quantify model estimation error, marking the first algorithm of demonstrating the efficacy of count-based conservatism in model-based offline deep RL to the best of our knowledge. For our proposed method, we first show that the estimation error is inversely proportional to the frequency of state-action pairs. Secondly, we demonstrate that the learned policy under the count-based conservative model offers near-optimality performance guarantees. Through extensive numerical experiments, we validate that $\texttt{Count-MORL}$ with hash code implementation significantly outperforms existing offline RL algorithms on the D4RL benchmark datasets. The code is accessible at $\href{https://github.com/oh-lab/Count-MORL}{https://github.com/oh-lab/Count-MORL}$.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Offline Reinforcement Learning | D4RL halfcheetah-medium-expert | Normalized Score100 | 155 | |
| Offline Reinforcement Learning | D4RL hopper-medium-expert | Normalized Score111.4 | 153 | |
| Offline Reinforcement Learning | D4RL walker2d-medium-expert | Normalized Score112.3 | 124 | |
| Offline Reinforcement Learning | D4RL Medium HalfCheetah | Normalized Score76.5 | 97 | |
| Offline Reinforcement Learning | D4RL Medium-Replay Hopper | Normalized Score101.7 | 97 | |
| Offline Reinforcement Learning | D4RL Medium Walker2d | Normalized Score87.6 | 96 | |
| Offline Reinforcement Learning | D4RL walker2d-random | Normalized Score21.9 | 93 | |
| Offline Reinforcement Learning | D4RL halfcheetah-random | Normalized Score41 | 86 | |
| Offline Reinforcement Learning | D4RL Medium-Replay HalfCheetah | Normalized Score71.5 | 84 | |
| Offline Reinforcement Learning | D4RL hopper-random | Normalized Score30.7 | 78 |