Goal Reaching with Eikonal-Constrained Hierarchical Quasimetric Reinforcement Learning
About
Goal-Conditioned Reinforcement Learning (GCRL) mitigates the difficulty of reward design by framing tasks as goal reaching rather than maximizing hand-crafted reward signals. In this setting, the optimal goal-conditioned value function naturally forms a quasimetric, motivating Quasimetric RL (QRL), which constrains value learning to quasimetric mappings and enforces local consistency through discrete, trajectory-based constraints. We propose Eikonal-Constrained Quasimetric RL (Eik-QRL), a continuous-time reformulation of QRL based on the Eikonal Partial Differential Equation (PDE). This PDE-based structure makes Eik-QRL trajectory-free, requiring only sampled states and goals, while improving out-of-distribution generalization. We provide theoretical guarantees for Eik-QRL and identify limitations that arise under complex dynamics. To address these challenges, we introduce Eik-Hierarchical QRL (Eik-HiQRL), which integrates Eik-QRL into a hierarchical decomposition. Empirically, Eik-HiQRL achieves state-of-the-art performance in offline goal-conditioned navigation and yields consistent gains over QRL in manipulation tasks, matching temporal-difference methods.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Goal-conditioned Reinforcement Learning | antmaze stitch large | Success Rate88 | 23 | |
| Goal-conditioned Reinforcement Learning | antmaze stitch medium | Success Rate0.94 | 23 | |
| Goal-conditioned Reinforcement Learning | humanoidmaze stitch large | Success Rate63 | 14 | |
| Goal-conditioned Reinforcement Learning | antsoccer stitch arena | Success Rate32 | 14 | |
| Goal-conditioned Reinforcement Learning | humanoidmaze stitch medium | Success Rate85 | 14 | |
| Goal-conditioned Reinforcement Learning | manipulation scene-play | Success Rate0.55 | 14 | |
| Goal-conditioned Reinforcement Learning | pointmaze navigate medium | Success Rate99 | 11 | |
| Goal-conditioned Reinforcement Learning | manipulation cube-single-play | Success Rate12 | 11 | |
| Goal-conditioned Reinforcement Learning | manipulation-cube-single-play (test) | Success Rate0.12 | 11 | |
| Goal-conditioned Reinforcement Learning | antsoccer-navigate-arena (test) | Success Rate61 | 5 |