Flattening Hierarchies with Policy Bootstrapping

About

Offline goal-conditioned reinforcement learning (GCRL) is a promising approach for pretraining generalist policies on large datasets of reward-free trajectories, akin to the self-supervised objectives used to train foundation models for computer vision and natural language processing. However, scaling GCRL to longer horizons remains challenging due to the combination of sparse rewards and discounting, which obscures the comparative advantages of primitive actions with respect to distant goals. Hierarchical RL methods achieve strong empirical results on long-horizon goal-reaching tasks, but their reliance on modular, timescale-specific policies and subgoal generation introduces significant additional complexity and hinders scaling to high-dimensional goal spaces. In this work, we introduce an algorithm to train a flat (non-hierarchical) goal-conditioned policy by bootstrapping on subgoal-conditioned policies with advantage-weighted importance sampling. Our approach eliminates the need for a generative model over the (sub)goal space, which we find is key for scaling to high-dimensional control in large state spaces. We further show that existing hierarchical and bootstrapping-based approaches correspond to specific design choices within our derivation. Across a comprehensive suite of state- and pixel-based locomotion and manipulation benchmarks, our method matches or surpasses state-of-the-art offline GCRL algorithms and scales to complex, long-horizon tasks where prior approaches fail. Project page: https://johnlyzhou.github.io/saw/

John L. Zhou, Jonathan C. Kao• 2025

Related benchmarks

Task	Dataset	Result
Goal-conditioned Reinforcement Learning	OGBench antmaze-medium-stitch v0	Success Rate64	12
Goal-conditioned Reinforcement Learning	OGBench humanoidmaze-medium-stitch v0	Success Rate63.6	12
Goal-conditioned Reinforcement Learning	OGBench humanoidmaze-large-stitch v0	Success Rate11.6	12
Goal-conditioned Reinforcement Learning	OGBench antmaze-large-explore v0	Success Rate1.9	12
Goal-conditioned Reinforcement Learning	OGBench antmaze-large-stitch v0	Success Rate3.1	12
Goal-conditioned Reinforcement Learning	OGBench antmaze-giant-stitch v0	Success Rate0.00e+0	12
Goal-conditioned Reinforcement Learning	OGBench humanoidmaze-giant-stitch v0	Success Rate0.00e+0	12
Goal-conditioned Reinforcement Learning	OGBench antmaze-medium-navigate v0	Success Rate96.3	11
Goal-conditioned Reinforcement Learning	manipulation cube-single-play	Success Rate73	11
Goal-conditioned Reinforcement Learning	OGBench antmaze-giant-navigate v0	Success Rate68.5	11

Showing 10 of 38 rows

Other info

Follow for update

@wizwand_team Discord