Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning
About
While goal-conditioned behavior cloning (GCBC) methods can perform well on in-distribution training tasks, they do not necessarily generalize zero-shot to tasks that require conditioning on novel state-goal pairs, i.e. combinatorial generalization. In part, this limitation can be attributed to a lack of temporal consistency in the state representation learned by BC; if temporally correlated states are properly encoded to similar latent representations, then the out-of-distribution gap for novel state-goal pairs would be reduced. We formalize this notion by demonstrating how encouraging long-range temporal consistency via successor representations (SR) can facilitate generalization. We then propose a simple yet effective representation learning objective, $\text{BYOL-}\gamma$ for GCBC, which theoretically approximates the successor representation in the finite MDP case through self-predictive representations, and achieves competitive empirical performance across a suite of challenging tasks requiring combinatorial generalization.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Goal-conditioned Reinforcement Learning | manipulation-cube-single-play (test) | Success Rate0.51 | 11 | |
| Goal-conditioned Reinforcement Learning | pointmaze navigate medium | Success Rate37 | 11 | |
| Goal-conditioned Reinforcement Learning | OGBench scene play (5 tasks) zero-shot | Average Return15 | 10 | |
| Goal-conditioned Reinforcement Learning | OGBench cube single play (5 tasks) zero-shot | Average Return13 | 6 | |
| Goal-conditioned Reinforcement Learning | OGBench antmaze teleport navigate (5 tasks) zero-shot | Average Return16 | 6 | |
| Unsupervised Reinforcement Learning | ExORL quadruped zero-shot | Average Return496 | 6 | |
| Goal-Conditioned Reinforcement Learning (Navigation) | pointmaze-large-navigate state-based v0 (test) | Success Rate22 | 6 | |
| Goal-Conditioned Reinforcement Learning (Navigation) | antmaze medium-navigate state-based v0 (test) | Success Rate39 | 6 | |
| Goal-Conditioned Reinforcement Learning (Navigation) | antmaze-large-navigate state-based v0 (test) | Success Rate11 | 6 | |
| Goal-Conditioned Reinforcement Learning (Navigation) | antsoccer-arena-navigate state-based v0 (test) | Success Rate11 | 6 |