Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Efficient Online Reinforcement Learning with Offline Data

About

Sample efficiency and exploration remain major challenges in online reinforcement learning (RL). A powerful approach that can be applied to address these issues is the inclusion of offline data, such as prior trajectories from a human expert or a sub-optimal exploration policy. Previous methods have relied on extensive modifications and additional complexity to ensure the effective use of this data. Instead, we ask: can we simply apply existing off-policy methods to leverage offline data when learning online? In this work, we demonstrate that the answer is yes; however, a set of minimal but important changes to existing off-policy RL algorithms are required to achieve reliable performance. We extensively ablate these design choices, demonstrating the key factors that most affect performance, and arrive at a set of recommendations that practitioners can readily apply, whether their data comprise a small number of expert demonstrations or large volumes of sub-optimal trajectories. We see that correct application of these simple recommendations can provide a $\mathbf{2.5\times}$ improvement over existing approaches across a diverse set of competitive benchmarks, with no additional computational overhead. We have released our code at https://github.com/ikostrikov/rlpd.

Philip J. Ball, Laura Smith, Ilya Kostrikov, Sergey Levine• 2023

Related benchmarks

TaskDatasetResultRank
LiftRobomimic Lift-State
Success Rate99
30
Square Nut AssemblyRobomimic Square-State
Success Rate0.00e+0
30
Can Pick & PlaceRobomimic Can-State
Success Rate0.00e+0
30
Goal-conditioned manipulationOGBench puzzle-4x4-play
Score58
24
Robotic ManipulationCan-Image
Success Rate0.00e+0
21
LocomotionMuJoCo walker2d medium-replay D4RL
Average Normalized Score119
16
NavigationOGBench humanoidmaze-medium-navigate
Success Rate (Offline)0.00e+0
15
Quadruped LocomotionSlippery Slope real-world evaluation
Forward Progression0.35
15
Robotic ManipulationOGBench puzzle-3x3-sparse online
Success Rate100
14
LocomotionMuJoCo hopper-random
Normalized Score90.2
14
Showing 10 of 102 rows
...

Other info

Follow for update