Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Grid-Mapping Pseudo-Count Constraint for Offline Reinforcement Learning

About

Offline reinforcement learning learns from a static dataset without interacting with environments, which ensures security and thus owns a good application prospect. However, directly applying naive reinforcement learning algorithm usually fails in an offline environment due to inaccurate Q value approximation caused by out-of-distribution (OOD) state-actions. It is an effective way to solve this problem by penalizing the Q-value of OOD state-actions. Among the methods of punishing OOD state-actions, count-based methods have achieved good results in discrete domains in a simple form. Inspired by it, a novel pseudo-count method for continuous domains called Grid-Mapping Pseudo-Count method (GPC) is proposed by extending the count-based method from discrete to continuous domains. Firstly, the continuous state and action space are mapped to discrete space using Grid-Mapping, then the Q-values of OOD state-actions are constrained through pseudo-count. Secondly, the theoretical proof is given to show that GPC can obtain appropriate uncertainty constraints under fewer assumptions than other pseudo-count methods. Thirdly, GPC is combined with Soft Actor-Critic algorithm (SAC) to get a new algorithm called GPC-SAC. Lastly, experiments on D4RL datasets are given to show that GPC-SAC has better performance and less computational cost than other algorithms that constrain the Q-value.

Yi Shen, Hanyan Huang• 2024

Related benchmarks

TaskDatasetResultRank
Offline Reinforcement Learninghopper medium
Normalized Score82.9
52
Offline Reinforcement Learningwalker2d medium
Normalized Score87.6
51
Offline Reinforcement Learningwalker2d medium-replay
Normalized Score86.2
50
Offline Reinforcement Learninghopper medium-replay
Normalized Score97.5
44
Offline Reinforcement Learninghalfcheetah medium
Normalized Score60.8
43
Offline Reinforcement Learninghalfcheetah medium-replay
Normalized Score55.7
43
Offline Reinforcement LearningMaze2D umaze
Normalized Return141
38
Offline Reinforcement LearningMaze2D medium
Normalized Return103.7
38
Offline Reinforcement LearningWalker2d medium-expert
Normalized Score111.7
31
Offline Reinforcement LearningHopper medium-expert
Normalized Score111.6
24
Showing 10 of 20 rows

Other info

Follow for update