Test-Time Graph Search for Goal-Conditioned Reinforcement Learning

About

Offline goal-conditioned reinforcement learning (GCRL) often struggles with long-horizon tasks, where errors in value estimation accumulate and produce unreliable policies. It is typically assumed that effective long-term planning is infeasible without specialized training. In contrast, our work demonstrates that existing GCRL policies can complete long-horizon tasks when combined with a lightweight, training-free planning wrapper. We find that standard goal-conditioned value functions encode locally consistent geometric structure sufficient for planning. Our approach, Test-Time Graph Search (TTGS), constructs a graph over the offline dataset and employs an adaptive subgoal selection strategy. To address unreliable value estimates during shortest-path search, we propose a novel mechanism that softly penalizes long-distance transitions. Our method incurs negligible computational overhead and requires no additional supervision or parameter updates. On the OGBench benchmark, TTGS significantly boosts success rates across multiple base learners and tasks, with primary gains on challenging long-horizon locomotion tasks where some success rates are improved from near-zero to over 90\%, often matching or outperforming methods that require complex auxiliary training. Code and videos can be found at https://ktolnos.github.io/ttgs.

Evgenii Opryshko, Junwei Quan, Claas Voelcker, Yilun Du, Igor Gilitschenski• 2025

Related benchmarks

Task	Dataset	Result
Goal-conditioned Reinforcement Learning	OGBench antmaze-large-explore v0	Success Rate91.8	12
Goal-conditioned Reinforcement Learning	OGBench humanoidmaze-giant-stitch v0	Success Rate88.8	12
Goal-conditioned Reinforcement Learning	OGBench antmaze-large-stitch v0	Success Rate91.4	12
Goal-conditioned Reinforcement Learning	OGBench antmaze-giant-stitch v0	Success Rate78.6	12
Goal-conditioned Reinforcement Learning	OGBench humanoidmaze-medium-stitch v0	Success Rate95	12
Goal-conditioned Reinforcement Learning	OGBench humanoidmaze-large-stitch v0	Success Rate75.6	12
Goal-conditioned Reinforcement Learning	OGBench antmaze-medium-stitch v0	Success Rate95.4	12
Goal-conditioned Reinforcement Learning	OGBench pointmaze-giant-stitch v0	Success Rate98	11
Goal-conditioned Reinforcement Learning	OGBench humanoidmaze-large-navigate v0	Success Rate86.4	11
Goal-conditioned Reinforcement Learning	OGBench humanoidmaze-giant-navigate v0	Success Rate92.8	11

Showing 10 of 27 rows

Other info

Follow for update

@wizwand_team Discord