Switching Successor Measures for Hierarchical Zero-shot Reinforcement Learning
About
Hierarchical reinforcement learning can improve generalization by decomposing long-horizon decision-making into simpler subproblems. However, existing approaches often rely on restrictive design choices, such as fixed temporal abstractions or goal-conditioned objectives, which largely confine them to goal-reaching tasks and limit their applicability to general reward functions. In this paper, we introduce switching successor measures, an extension of successor measures that enables hierarchical control in zero-shot reinforcement learning without additional supervision, fixed horizons, or manually designed subgoals. We show that switching successor measures arise naturally from classical successor measures while preserving their underlying structure. Building on this result, we propose FB $\pi$-Switch, an algorithm that extracts both a high-level subgoal-selection policy and a low-level control policy directly from forward-backward (FB) representations, allowing hierarchical behavior to emerge from a single learned representation. Experiments on both goal-conditioned and general reward-based tasks show that FB $\pi$-Switch improves over non-hierarchical baselines and matches state-of-the-art hierarchical methods in goal-conditioned settings. These results demonstrate that structured successor representations provide a flexible foundation for hierarchical zero-shot reinforcement learning beyond goal-reaching tasks. Our project website is available at: https://stestokth.github.io/switching-successors/.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Goal Reaching | AntMaze Medium navigate-v0 (test-time goals) | Success Rate87 | 10 | |
| Goal Reaching | AntMaze Large test-time goals navigate-v0 | Success Rate66 | 10 | |
| Goal Reaching | AntMaze Teleport navigate-v0 (test-time goals) | Average Success Rate40 | 10 | |
| Goal Reaching | AntMaze Giant navigate test-time goals v0 | Average Success Rate1 | 10 | |
| Goal Reaching | antmaze medium-navigate v0 | Success Rate87 | 8 | |
| Goal Reaching | antmaze large-navigate v0 | Success Rate66 | 8 | |
| Goal Reaching | antmaze giant-navigate v0 | Success Rate1 | 8 | |
| Goal Reaching | antmaze teleport-navigate v0 | Success Rate40 | 8 | |
| Continuous Control | AntMaze Large | Task 1 Score938 | 6 | |
| Continuous Control | Antmaze Giant | Task 1 Performance125 | 6 |