Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Revisiting the Minimalist Approach to Offline Reinforcement Learning

About

Recent years have witnessed significant advancements in offline reinforcement learning (RL), resulting in the development of numerous algorithms with varying degrees of complexity. While these algorithms have led to noteworthy improvements, many incorporate seemingly minor design choices that impact their effectiveness beyond core algorithmic advances. However, the effect of these design choices on established baselines remains understudied. In this work, we aim to bridge this gap by conducting a retrospective analysis of recent works in offline RL and propose ReBRAC, a minimalistic algorithm that integrates such design elements built on top of the TD3+BC method. We evaluate ReBRAC on 51 datasets with both proprioceptive and visual state spaces using D4RL and V-D4RL benchmarks, demonstrating its state-of-the-art performance among ensemble-free methods in both offline and offline-to-online settings. To further illustrate the efficacy of these design choices, we perform a large-scale ablation study and hyperparameter sensitivity analysis on the scale of thousands of experiments.

Denis Tarasov, Vladislav Kurenkov, Alexander Nikulin, Sergey Kolesnikov• 2023

Related benchmarks

TaskDatasetResultRank
Offline Reinforcement LearningOGBench antmaze-large-navigate-singletask task1-v0 to task5-v0
Score91
55
Offline Reinforcement LearningD4RL antmaze-umaze (diverse)
Normalized Score88.3
40
Offline Reinforcement LearningD4RL MuJoCo Hopper medium standard
Normalized Score102
36
Offline Reinforcement LearningD4RL Adroit pen (human)
Normalized Return103.5
32
Offline Reinforcement LearningD4RL Adroit pen (cloned)
Normalized Return102.8
32
Offline Reinforcement LearningD4RL antmaze-large (play)
Normalized Score60.4
26
Offline Reinforcement LearningD4RL antmaze-large (diverse)
Normalized Score54.4
26
Offline Reinforcement LearningD4RL antmaze-med (diverse)
Normalized Score76.3
26
Offline Reinforcement LearningMuJoCo hopper D4RL (medium-replay)
Normalized Return98.1
26
Offline Reinforcement LearningOGBench antmaze-giant-navigate-singletask task1-v0 to task5-v0
Score49
22
Showing 10 of 67 rows

Other info

Follow for update