Dual Goal Representations
About
In this work, we introduce dual goal representations for goal-conditioned reinforcement learning (GCRL). A dual goal representation characterizes a state by "the set of temporal distances from all other states"; in other words, it encodes a state through its relations to every other state, measured by temporal distance. This representation provides several appealing theoretical properties. First, it depends only on the intrinsic dynamics of the environment and is invariant to the original state representation. Second, it contains provably sufficient information to recover an optimal goal-reaching policy, while being able to filter out exogenous noise. Based on this concept, we develop a practical goal representation learning method that can be combined with any existing GCRL algorithm. Through diverse experiments on the OGBench task suite, we empirically show that dual goal representations consistently improve offline goal-reaching performance across 20 state- and pixel-based tasks.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Goal-conditioned Reinforcement Learning | manipulation-cube-single-play (test) | Success Rate0.89 | 11 | |
| Goal-conditioned Reinforcement Learning | pointmaze navigate medium | Success Rate76 | 11 | |
| Goal-Conditioned Reinforcement Learning (Manipulation) | cube-double-play state-based v0 (test) | Success Rate60 | 6 | |
| Goal-Conditioned Reinforcement Learning (Manipulation) | scene-play state-based v0 (test) | Success Rate72 | 6 | |
| Goal-Conditioned Reinforcement Learning (Manipulation) | puzzle-4x4-play state-based v0 (test) | Success Rate23 | 6 | |
| Goal-Conditioned Reinforcement Learning (Navigation) | antmaze medium-navigate state-based v0 (test) | Success Rate75 | 6 | |
| Goal-Conditioned Reinforcement Learning (Navigation) | antmaze-large-navigate state-based v0 (test) | Success Rate28 | 6 | |
| Goal-Conditioned Reinforcement Learning (Navigation) | humanoidmaze medium-navigate (state-based) v0 (test) | Success Rate29 | 6 | |
| Goal-Conditioned Reinforcement Learning (Navigation) | pointmaze-large-navigate state-based v0 (test) | Success Rate46 | 6 | |
| Goal-Conditioned Reinforcement Learning (Navigation) | humanoidmaze-large-navigate state-based v0 (test) | Success Rate3 | 6 |