Abstraction for Offline Goal-Conditioned Reinforcement Learning
About
Markov Decision Processes (MDPs) often exhibit significant redundancy due to symmetries and shared structure across state-goal pairs in real-world Goal-Conditioned Reinforcement Learning (GCRL). While hierarchical policies have been motivated for horizon reduction via temporal abstraction in offline GCRL, we demonstrate that hierarchy also enables absolute abstraction. By introducing relativised options as well as distinct representations for different levels of the hierarchy, we demonstrate how an agent can reuse experience across similar contexts of the state-space. Based on this framework, we introduce two simple algorithms for learning relativised options and abstracting from the absolute frame of reference. Our experiments show that such inductive biases significantly improve performance in offline GCRL.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Offline Goal-Conditioned Reinforcement Learning | antmaze giant-stitch v0 | Success Rate32 | 13 | |
| Offline Goal-Conditioned Reinforcement Learning | humanoidmaze giant-navigate v0 | Success Rate49 | 13 | |
| Offline Goal-Conditioned Reinforcement Learning | antmaze teleport-stitch v0 | Success Rate41 | 13 | |
| Offline Goal-Conditioned Reinforcement Learning | puzzle 3x3 play v0 | Success Rate86 | 13 | |
| Offline Goal-Conditioned Reinforcement Learning | antmaze giant-navigate v0 | Success Rate48 | 13 | |
| Offline Goal-Conditioned Reinforcement Learning | humanoidmaze giant-stitch v0 | Success Rate13 | 6 | |
| Offline Goal-Conditioned Reinforcement Learning | cube-double-play v0 | Average Binary Success Rate67 | 6 | |
| Offline Goal-Conditioned Reinforcement Learning | cube-triple-play v0 | Average Binary Success Rate15 | 6 | |
| Offline Goal-Conditioned Reinforcement Learning | cube-quadruple-play v0 | Average Binary Success Rate100 | 6 | |
| Offline Goal-Conditioned Reinforcement Learning | puzzle 4x4 play v0 | Average Binary Success Rate88 | 6 |