Test-Time Graph Search for Goal-Conditioned Reinforcement Learning
About
Offline goal-conditioned reinforcement learning (GCRL) often struggles with long-horizon tasks, where errors in value estimation accumulate and produce unreliable policies. It is typically assumed that effective long-term planning is infeasible without specialized training. In contrast, our work demonstrates that existing GCRL policies can complete long-horizon tasks when combined with a lightweight, training-free planning wrapper. We find that standard goal-conditioned value functions encode locally consistent geometric structure sufficient for planning. Our approach, Test-Time Graph Search (TTGS), constructs a graph over the offline dataset and employs an adaptive subgoal selection strategy. To address unreliable value estimates during shortest-path search, we propose a novel mechanism that softly penalizes long-distance transitions. Our method incurs negligible computational overhead and requires no additional supervision or parameter updates. On the OGBench benchmark, TTGS significantly boosts success rates across multiple base learners and tasks, with primary gains on challenging long-horizon locomotion tasks where some success rates are improved from near-zero to over 90\%, often matching or outperforming methods that require complex auxiliary training. Code and videos can be found at https://ktolnos.github.io/ttgs.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Goal-conditioned Reinforcement Learning | OGBench antmaze-large-explore v0 | Success Rate91.8 | 12 | |
| Goal-conditioned Reinforcement Learning | OGBench humanoidmaze-giant-stitch v0 | Success Rate88.8 | 12 | |
| Goal-conditioned Reinforcement Learning | OGBench antmaze-large-stitch v0 | Success Rate91.4 | 12 | |
| Goal-conditioned Reinforcement Learning | OGBench antmaze-giant-stitch v0 | Success Rate78.6 | 12 | |
| Goal-conditioned Reinforcement Learning | OGBench humanoidmaze-medium-stitch v0 | Success Rate95 | 12 | |
| Goal-conditioned Reinforcement Learning | OGBench humanoidmaze-large-stitch v0 | Success Rate75.6 | 12 | |
| Goal-conditioned Reinforcement Learning | OGBench antmaze-medium-stitch v0 | Success Rate95.4 | 12 | |
| Goal-conditioned Reinforcement Learning | OGBench pointmaze-giant-stitch v0 | Success Rate98 | 11 | |
| Goal-conditioned Reinforcement Learning | OGBench humanoidmaze-large-navigate v0 | Success Rate86.4 | 11 | |
| Goal-conditioned Reinforcement Learning | OGBench humanoidmaze-giant-navigate v0 | Success Rate92.8 | 11 |