Option-aware Temporally Abstracted Value for Offline Goal-Conditioned Reinforcement Learning

About

Offline goal-conditioned reinforcement learning (GCRL) offers a practical learning paradigm in which goal-reaching policies are trained from abundant state-action trajectory datasets without additional environment interaction. However, offline GCRL still struggles with long-horizon tasks, even with recent advances that employ hierarchical policy structures, such as HIQL. Identifying the root cause of this challenge, we observe the following insight. Firstly, performance bottlenecks mainly stem from the high-level policy's inability to generate appropriate subgoals. Secondly, when learning the high-level policy in the long-horizon regime, the sign of the advantage estimate frequently becomes incorrect. Thus, we argue that improving the value function to produce a clear advantage estimate for learning the high-level policy is essential. In this paper, we propose a simple yet effective solution: Option-aware Temporally Abstracted value learning, dubbed OTA, which incorporates temporal abstraction into the temporal-difference learning process. By modifying the value update to be option-aware, our approach contracts the effective horizon length, enabling better advantage estimates even in long-horizon regimes. We experimentally show that the high-level policy learned using the OTA value function achieves strong performance on complex tasks from OGBench, a recently proposed offline GCRL benchmark, including maze navigation and visual robotic manipulation environments.

Hongjoon Ahn, Heewoong Choi, Jisu Han, Taesup Moon• 2025

Related benchmarks

Task	Dataset	Result
Object Manipulation	OGBench cube play (Double)	Success Rate5	39
Object Manipulation	OGBench cube play (Quadruple)	Success Rate0.00e+0	28
Goal-conditioned locomotion	OGBench HumanoidMaze-Stitch Medium	Success Rate92	19
Goal-conditioned locomotion	OGBench HumanoidMaze-Stitch Large	Success Rate43	19
Object Manipulation	OGBench cube play (Triple)	Success Rate2	19
Goal Reaching	antmaze teleport-navigate v0	Success Rate53	17
Navigation	humanoidmaze medium-navigate v0 (test)	Success Rate95	16
Navigation	humanoidmaze-large-navigate v0 (test)	Success Rate83	16
Manipulation	scene-play v0	Success Rate34	16
Manipulation	OGBench cube-triple-noisy	Success Rate1	16

Showing 10 of 72 rows

...

Other info

Follow for update

@wizwand_team Discord