Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Offline Reinforcement Learning with Fisher Divergence Critic Regularization

About

Many modern approaches to offline Reinforcement Learning (RL) utilize behavior regularization, typically augmenting a model-free actor critic algorithm with a penalty measuring divergence of the policy from the offline data. In this work, we propose an alternative approach to encouraging the learned policy to stay close to the data, namely parameterizing the critic as the log-behavior-policy, which generated the offline data, plus a state-action value offset term, which can be learned using a neural network. Behavior regularization then corresponds to an appropriate regularizer on the offset term. We propose using a gradient penalty regularizer for the offset term and demonstrate its equivalence to Fisher divergence regularization, suggesting connections to the score matching and generative energy-based model literature. We thus term our resulting algorithm Fisher-BRC (Behavior Regularized Critic). On standard offline RL benchmarks, Fisher-BRC achieves both improved performance and faster convergence over existing state-of-the-art methods.

Ilya Kostrikov, Jonathan Tompson, Rob Fergus, Ofir Nachum• 2021

Related benchmarks

TaskDatasetResultRank
Offline Reinforcement LearningD4RL walker2d-random
Normalized Score60
77
Offline Reinforcement LearningD4RL halfcheetah-random
Normalized Score32.2
70
Offline Reinforcement LearningD4RL hopper-random
Normalized Score11.4
62
hopper locomotionD4RL hopper medium-replay
Normalized Score94.7
56
walker2d locomotionD4RL walker2d medium-replay
Normalized Score73.8
53
LocomotionD4RL walker2d-medium-expert
Normalized Score109.6
47
LocomotionD4RL Halfcheetah medium
Normalized Score47.4
44
LocomotionD4RL Walker2d medium
Normalized Score0.783
44
hopper locomotionD4RL Hopper medium
Normalized Score66.2
38
hopper locomotionD4RL hopper-medium-expert
Normalized Score91.5
38
Showing 10 of 101 rows
...

Other info

Follow for update