Temporal Representation Alignment: Successor Features Enable Emergent Compositionality in Robot Instruction Following
About
Effective task representations should facilitate compositionality, such that after learning a variety of basic tasks, an agent can perform compound tasks consisting of multiple steps simply by composing the representations of the constituent steps together. While this is conceptually simple and appealing, it is not clear how to automatically learn representations that enable this sort of compositionality. We show that learning to associate the representations of current and future states with a temporal alignment loss can improve compositional generalization, even in the absence of any explicit subtask planning or reinforcement learning. We evaluate our approach across diverse robotic manipulation tasks as well as in simulation, showing substantial improvements for tasks specified with either language or goal images.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Goal-conditioned Reinforcement Learning | manipulation-cube-single-play (test) | Success Rate0.4 | 11 | |
| Goal-conditioned Reinforcement Learning | pointmaze navigate medium | Success Rate3 | 11 | |
| Goal-Conditioned Reinforcement Learning (Navigation) | antmaze-large-navigate state-based v0 (test) | Success Rate22 | 6 | |
| Goal-Conditioned Reinforcement Learning (Manipulation) | puzzle-3x3-play state-based v0 (test) | Success Rate5 | 6 | |
| Goal-Conditioned Reinforcement Learning (Manipulation) | puzzle-4x4-play state-based v0 (test) | Success Rate10 | 6 | |
| Goal-Conditioned Reinforcement Learning (Manipulation) | cube-double-play state-based v0 (test) | Success Rate7 | 6 | |
| Goal-Conditioned Reinforcement Learning (Manipulation) | scene-play state-based v0 (test) | Success Rate46 | 6 | |
| Goal-Conditioned Reinforcement Learning (Navigation) | antmaze-giant-navigate state-based v0 (test) | Success Rate0.00e+0 | 6 | |
| Goal-Conditioned Reinforcement Learning (Navigation) | humanoidmaze medium-navigate (state-based) v0 (test) | Success Rate21 | 6 | |
| Goal-Conditioned Reinforcement Learning (Navigation) | humanoidmaze-large-navigate state-based v0 (test) | Success Rate2 | 6 |