Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Grid-Mapping Pseudo-Count Constraint for Offline Reinforcement Learning

About

Offline reinforcement learning learns from a static dataset without interacting with environments, which ensures security and thus owns a good application prospect. However, directly applying naive reinforcement learning algorithm usually fails in an offline environment due to inaccurate Q value approximation caused by out-of-distribution (OOD) state-actions. It is an effective way to solve this problem by penalizing the Q-value of OOD state-actions. Among the methods of punishing OOD state-actions, count-based methods have achieved good results in discrete domains in a simple form. Inspired by it, a novel pseudo-count method for continuous domains called Grid-Mapping Pseudo-Count method (GPC) is proposed by extending the count-based method from discrete to continuous domains. Firstly, the continuous state and action space are mapped to discrete space using Grid-Mapping, then the Q-values of OOD state-actions are constrained through pseudo-count. Secondly, the theoretical proof is given to show that GPC can obtain appropriate uncertainty constraints under fewer assumptions than other pseudo-count methods. Thirdly, GPC is combined with Soft Actor-Critic algorithm (SAC) to get a new algorithm called GPC-SAC. Lastly, experiments on D4RL datasets are given to show that GPC-SAC has better performance and less computational cost than other algorithms that constrain the Q-value.

Yi Shen, Hanyan Huang• 2024

Related benchmarks

TaskDatasetResultRank
Offline Reinforcement Learninghopper medium
Normalized Score82.9
68
Offline Reinforcement Learningwalker2d medium-replay
Normalized Score86.2
61
Offline Reinforcement Learningwalker2d medium
Normalized Score87.6
61
Offline Reinforcement Learninghopper medium-replay
Normalized Score97.5
55
Offline Reinforcement Learninghalfcheetah medium-replay
Normalized Score55.7
54
Offline Reinforcement Learninghalfcheetah medium
Normalized Score60.8
53
Offline Reinforcement LearningWalker2d medium-expert
Normalized Score111.7
42
Offline Reinforcement LearningMaze2D umaze
Normalized Return141
38
Offline Reinforcement LearningMaze2D medium
Normalized Return103.7
38
Offline Reinforcement LearningD4RL Walker2d expert
Mean Normalized Score111.7
38
Showing 10 of 20 rows

Other info

Follow for update