Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Physics-informed Value Learner for Offline Goal-Conditioned Reinforcement Learning

About

Offline Goal-Conditioned Reinforcement Learning (GCRL) holds great promise for domains such as autonomous navigation and locomotion, where collecting interactive data is costly and unsafe. However, it remains challenging in practice due to the need to learn from datasets with limited coverage of the state-action space and to generalize across long-horizon tasks. To improve on these challenges, we propose a \emph{Physics-informed (Pi)} regularized loss for value learning, derived from the Eikonal Partial Differential Equation (PDE) and which induces a geometric inductive bias in the learned value function. Unlike generic gradient penalties that are primarily used to stabilize training, our formulation is grounded in continuous-time optimal control and encourages value functions to align with cost-to-go structures. The proposed regularizer is broadly compatible with temporal-difference-based value learning and can be integrated into existing Offline GCRL algorithms. When combined with Hierarchical Implicit Q-Learning (HIQL), the resulting method, Eikonal-regularized HIQL (Eik-HIQL), yields significant improvements in both performance and generalization, with pronounced gains in stitching regimes and large-scale navigation tasks.

Vittorio Giammarino, Ruiqi Ni, Ahmed H. Qureshi• 2025

Related benchmarks

TaskDatasetResultRank
Goal-conditioned Reinforcement Learningantmaze stitch large
Success Rate84
23
Goal-conditioned Reinforcement Learningantmaze stitch medium
Success Rate0.94
23
Goal-conditioned Reinforcement Learninghumanoidmaze stitch large
Success Rate29
14
Goal-conditioned Reinforcement Learninghumanoidmaze stitch medium
Success Rate79
14
Goal-conditioned Reinforcement Learningantsoccer stitch arena
Success Rate2
14
Goal-conditioned Reinforcement Learningmanipulation scene-play
Success Rate0.52
14
Goal-conditioned Reinforcement Learningpointmaze navigate medium
Success Rate93
11
Goal-conditioned Reinforcement Learningmanipulation cube-single-play
Success Rate25
11
Goal-conditioned Reinforcement Learningmanipulation-cube-single-play (test)
Success Rate0.25
11
Goal-conditioned Reinforcement Learningmanipulation-scene-play (test)
Success Rate52
5
Showing 10 of 40 rows

Other info

Follow for update