Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Flattening Hierarchies with Policy Bootstrapping

About

Offline goal-conditioned reinforcement learning (GCRL) is a promising approach for pretraining generalist policies on large datasets of reward-free trajectories, akin to the self-supervised objectives used to train foundation models for computer vision and natural language processing. However, scaling GCRL to longer horizons remains challenging due to the combination of sparse rewards and discounting, which obscures the comparative advantages of primitive actions with respect to distant goals. Hierarchical RL methods achieve strong empirical results on long-horizon goal-reaching tasks, but their reliance on modular, timescale-specific policies and subgoal generation introduces significant additional complexity and hinders scaling to high-dimensional goal spaces. In this work, we introduce an algorithm to train a flat (non-hierarchical) goal-conditioned policy by bootstrapping on subgoal-conditioned policies with advantage-weighted importance sampling. Our approach eliminates the need for a generative model over the (sub)goal space, which we find is key for scaling to high-dimensional control in large state spaces. We further show that existing hierarchical and bootstrapping-based approaches correspond to specific design choices within our derivation. Across a comprehensive suite of state- and pixel-based locomotion and manipulation benchmarks, our method matches or surpasses state-of-the-art offline GCRL algorithms and scales to complex, long-horizon tasks where prior approaches fail. Project page: https://johnlyzhou.github.io/saw/

John L. Zhou, Jonathan C. Kao• 2025

Related benchmarks

TaskDatasetResultRank
Goal-conditioned Reinforcement LearningOGBench antmaze-medium-stitch v0
Success Rate64
12
Goal-conditioned Reinforcement LearningOGBench humanoidmaze-medium-stitch v0
Success Rate63.6
12
Goal-conditioned Reinforcement LearningOGBench humanoidmaze-large-stitch v0
Success Rate11.6
12
Goal-conditioned Reinforcement LearningOGBench antmaze-large-explore v0
Success Rate1.9
12
Goal-conditioned Reinforcement LearningOGBench antmaze-large-stitch v0
Success Rate3.1
12
Goal-conditioned Reinforcement LearningOGBench antmaze-giant-stitch v0
Success Rate0.00e+0
12
Goal-conditioned Reinforcement LearningOGBench humanoidmaze-giant-stitch v0
Success Rate0.00e+0
12
Goal-conditioned Reinforcement LearningOGBench antmaze-medium-navigate v0
Success Rate96.3
11
Goal-conditioned Reinforcement Learningmanipulation cube-single-play
Success Rate73
11
Goal-conditioned Reinforcement LearningOGBench antmaze-giant-navigate v0
Success Rate68.5
11
Showing 10 of 38 rows

Other info

Follow for update