Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

A Minimalist Approach to Offline Reinforcement Learning

About

Offline reinforcement learning (RL) defines the task of learning from a fixed batch of data. Due to errors in value estimation from out-of-distribution actions, most offline RL algorithms take the approach of constraining or regularizing the policy with the actions contained in the dataset. Built on pre-existing RL algorithms, modifications to make an RL algorithm work offline comes at the cost of additional complexity. Offline RL algorithms introduce new hyperparameters and often leverage secondary components such as generative models, while adjusting the underlying RL algorithm. In this paper we aim to make a deep RL algorithm work while making minimal changes. We find that we can match the performance of state-of-the-art offline RL algorithms by simply adding a behavior cloning term to the policy update of an online RL algorithm and normalizing the data. The resulting algorithm is a simple to implement and tune baseline, while more than halving the overall run time by removing the additional computational overhead of previous methods.

Scott Fujimoto, Shixiang Shane Gu• 2021

Related benchmarks

TaskDatasetResultRank
Offline Reinforcement LearningD4RL halfcheetah-medium-expert
Normalized Score95.9
169
Offline Reinforcement LearningD4RL hopper-medium-expert
Normalized Score112.4
161
Offline Reinforcement LearningD4RL walker2d-medium-expert
Normalized Score110.1
132
Offline Reinforcement LearningD4RL Medium-Replay Hopper
Normalized Score60.9
109
Offline Reinforcement LearningD4RL Medium HalfCheetah
Normalized Score48.3
105
Offline Reinforcement Learning under Gravity ShiftMuJoCo HalfCheetah
Normalized Return73.75
104
Offline Reinforcement LearningD4RL Medium Walker2d
Normalized Score83.7
104
Offline Reinforcement Learning under Gravity ShiftMuJoCo Ant
Normalized Return34.04
104
Offline Reinforcement Learning under Gravity ShiftMuJoCo Hopper
Normalized Return23
104
Offline Reinforcement LearningD4RL walker2d-random
Normalized Score140
101
Showing 10 of 586 rows
...

Other info

Code

Follow for update