Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

HIQL: Offline Goal-Conditioned RL with Latent States as Actions

About

Unsupervised pre-training has recently become the bedrock for computer vision and natural language processing. In reinforcement learning (RL), goal-conditioned RL can potentially provide an analogous self-supervised approach for making use of large quantities of unlabeled (reward-free) data. However, building effective algorithms for goal-conditioned RL that can learn directly from diverse offline data is challenging, because it is hard to accurately estimate the exact value function for faraway goals. Nonetheless, goal-reaching problems exhibit structure, such that reaching distant goals entails first passing through closer subgoals. This structure can be very useful, as assessing the quality of actions for nearby goals is typically easier than for more distant goals. Based on this idea, we propose a hierarchical algorithm for goal-conditioned RL from offline data. Using one action-free value function, we learn two policies that allow us to exploit this structure: a high-level policy that treats states as actions and predicts (a latent representation of) a subgoal and a low-level policy that predicts the action for reaching this subgoal. Through analysis and didactic examples, we show how this hierarchical decomposition makes our method robust to noise in the estimated value function. We then apply our method to offline goal-reaching benchmarks, showing that our method can solve long-horizon tasks that stymie prior methods, can scale to high-dimensional image observations, and can readily make use of action-free data. Our code is available at https://seohong.me/projects/hiql/

Seohong Park, Dibya Ghosh, Benjamin Eysenbach, Sergey Levine• 2023

Related benchmarks

TaskDatasetResultRank
Goal-conditioned manipulationOGBench puzzle-4x4-play
Score0.07
24
Goal-conditioned Reinforcement Learningantmaze stitch large
Success Rate74
23
Goal-conditioned Reinforcement Learningantmaze stitch medium
Success Rate0.95
23
ManipulationOGBench cube-triple-play
Success Rate3
19
Robotic PlanningOGBench AntMaze Giant 48 (stitch)
Success Rate21
16
Robotic PlanningOGBench Scene 48 (play)
Success Rate0.38
16
Robotic PlanningOGBench PointMaze Giant 48 (stitch)
Success Rate0.00e+0
16
Goal-conditioned Reinforcement Learninghumanoidmaze stitch medium
Success Rate88
14
Offline Goal-Conditioned Reinforcement Learningantmaze medium-navigate v0
Success Rate96
14
Goal-conditioned Reinforcement Learninghumanoidmaze stitch large
Success Rate28
14
Showing 10 of 220 rows
...

Other info

Code

Follow for update