Switching Successor Measures for Hierarchical Zero-shot Reinforcement Learning

About

Hierarchical reinforcement learning can improve generalization by decomposing long-horizon decision-making into simpler subproblems. However, existing approaches often rely on restrictive design choices, such as fixed temporal abstractions or goal-conditioned objectives, which largely confine them to goal-reaching tasks and limit their applicability to general reward functions. In this paper, we introduce switching successor measures, an extension of successor measures that enables hierarchical control in zero-shot reinforcement learning without additional supervision, fixed horizons, or manually designed subgoals. We show that switching successor measures arise naturally from classical successor measures while preserving their underlying structure. Building on this result, we propose FB $\pi$-Switch, an algorithm that extracts both a high-level subgoal-selection policy and a low-level control policy directly from forward-backward (FB) representations, allowing hierarchical behavior to emerge from a single learned representation. Experiments on both goal-conditioned and general reward-based tasks show that FB $\pi$-Switch improves over non-hierarchical baselines and matches state-of-the-art hierarchical methods in goal-conditioned settings. These results demonstrate that structured successor representations provide a flexible foundation for hierarchical zero-shot reinforcement learning beyond goal-reaching tasks. Our project website is available at: https://stestokth.github.io/switching-successors/.

Stefan Stojanovic, Alexandre Proutiere• 2026

Related benchmarks

Task	Dataset	Result
Goal Reaching	antmaze teleport-navigate v0	Success Rate40	17
Goal Reaching	AntMaze Medium navigate-v0 (test-time goals)	Success Rate87	10
Goal Reaching	AntMaze Large test-time goals navigate-v0	Success Rate66	10
Goal Reaching	AntMaze Teleport navigate-v0 (test-time goals)	Average Success Rate40	10
Goal Reaching	AntMaze Giant navigate test-time goals v0	Average Success Rate1	10
Goal Reaching	antmaze medium-navigate v0	Success Rate87	8
Goal Reaching	antmaze large-navigate v0	Success Rate66	8
Goal Reaching	antmaze giant-navigate v0	Success Rate1	8
Continuous Control	AntMaze Large	Task 1 Score938	6
Continuous Control	Antmaze Giant	Task 1 Performance125	6

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord