Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning
About
While goal-conditioned behavior cloning (GCBC) methods can perform well on in-distribution training tasks, they do not necessarily generalize zero-shot to tasks that require conditioning on novel state-goal pairs, i.e. combinatorial generalization. In part, this limitation can be attributed to a lack of temporal consistency in the state representation learned by BC; if temporally correlated states are properly encoded to similar latent representations, then the out-of-distribution gap for novel state-goal pairs would be reduced. We formalize this notion by demonstrating how encouraging long-range temporal consistency via successor representations (SR) can facilitate generalization. We then propose a simple yet effective representation learning objective, $\text{BYOL-}\gamma$ for GCBC, which theoretically approximates the successor representation in the finite MDP case through self-predictive representations, and achieves competitive empirical performance across a suite of challenging tasks requiring combinatorial generalization.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Offline Reinforcement Learning | D4RL Franka Kitchen | Mixed Success Rate69 | 43 | |
| Goal-conditioned Reinforcement Learning | antmaze stitch medium | Success Rate68 | 23 | |
| Goal-conditioned Reinforcement Learning | antmaze stitch large | Success Rate26 | 23 | |
| Robotic Manipulation | D4RL Kitchen-Partial | Normalized Score75 | 23 | |
| Goal-conditioned Reinforcement Learning | antsoccer stitch arena | Success Rate25 | 14 | |
| Goal-conditioned Reinforcement Learning | manipulation scene-play | Success Rate17 | 14 | |
| Goal-conditioned Reinforcement Learning | humanoidmaze stitch medium | Success Rate51 | 14 | |
| Goal-conditioned Reinforcement Learning | humanoidmaze stitch large | Success Rate13 | 14 | |
| Robotic Manipulation | D4RL Kitchen-Mixed | -- | 14 | |
| Goal-conditioned Reinforcement Learning | manipulation-cube-single-play (test) | Success Rate0.51 | 11 |