Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Test-Time Graph Search for Goal-Conditioned Reinforcement Learning

About

Offline goal-conditioned reinforcement learning (GCRL) often struggles with long-horizon tasks, where errors in value estimation accumulate and produce unreliable policies. It is typically assumed that effective long-term planning is infeasible without specialized training. In contrast, our work demonstrates that existing GCRL policies can complete long-horizon tasks when combined with a lightweight, training-free planning wrapper. We find that standard goal-conditioned value functions encode locally consistent geometric structure sufficient for planning. Our approach, Test-Time Graph Search (TTGS), constructs a graph over the offline dataset and employs an adaptive subgoal selection strategy. To address unreliable value estimates during shortest-path search, we propose a novel mechanism that softly penalizes long-distance transitions. Our method incurs negligible computational overhead and requires no additional supervision or parameter updates. On the OGBench benchmark, TTGS significantly boosts success rates across multiple base learners and tasks, with primary gains on challenging long-horizon locomotion tasks where some success rates are improved from near-zero to over 90\%, often matching or outperforming methods that require complex auxiliary training. Code and videos can be found at https://ktolnos.github.io/ttgs.

Evgenii Opryshko, Junwei Quan, Claas Voelcker, Yilun Du, Igor Gilitschenski• 2025

Related benchmarks

TaskDatasetResultRank
Goal-conditioned Reinforcement LearningOGBench antmaze-large-explore v0
Success Rate91.8
12
Goal-conditioned Reinforcement LearningOGBench humanoidmaze-giant-stitch v0
Success Rate88.8
12
Goal-conditioned Reinforcement LearningOGBench antmaze-large-stitch v0
Success Rate91.4
12
Goal-conditioned Reinforcement LearningOGBench antmaze-giant-stitch v0
Success Rate78.6
12
Goal-conditioned Reinforcement LearningOGBench humanoidmaze-medium-stitch v0
Success Rate95
12
Goal-conditioned Reinforcement LearningOGBench humanoidmaze-large-stitch v0
Success Rate75.6
12
Goal-conditioned Reinforcement LearningOGBench antmaze-medium-stitch v0
Success Rate95.4
12
Goal-conditioned Reinforcement LearningOGBench pointmaze-giant-stitch v0
Success Rate98
11
Goal-conditioned Reinforcement LearningOGBench humanoidmaze-large-navigate v0
Success Rate86.4
11
Goal-conditioned Reinforcement LearningOGBench humanoidmaze-giant-navigate v0
Success Rate92.8
11
Showing 10 of 27 rows

Other info

Follow for update