Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning

About

While goal-conditioned behavior cloning (GCBC) methods can perform well on in-distribution training tasks, they do not necessarily generalize zero-shot to tasks that require conditioning on novel state-goal pairs, i.e. combinatorial generalization. In part, this limitation can be attributed to a lack of temporal consistency in the state representation learned by BC; if temporally correlated states are properly encoded to similar latent representations, then the out-of-distribution gap for novel state-goal pairs would be reduced. We formalize this notion by demonstrating how encouraging long-range temporal consistency via successor representations (SR) can facilitate generalization. We then propose a simple yet effective representation learning objective, $\text{BYOL-}\gamma$ for GCBC, which theoretically approximates the successor representation in the finite MDP case through self-predictive representations, and achieves competitive empirical performance across a suite of challenging tasks requiring combinatorial generalization.

Daniel Lawson, Adriana Hugessen, Charlotte Cloutier, Glen Berseth, Khimya Khetarpal• 2025

Related benchmarks

TaskDatasetResultRank
Goal-conditioned Reinforcement Learningmanipulation-cube-single-play (test)
Success Rate0.51
11
Goal-conditioned Reinforcement Learningpointmaze navigate medium
Success Rate37
11
Goal-conditioned Reinforcement LearningOGBench scene play (5 tasks) zero-shot
Average Return15
10
Goal-conditioned Reinforcement LearningOGBench cube single play (5 tasks) zero-shot
Average Return13
6
Goal-conditioned Reinforcement LearningOGBench antmaze teleport navigate (5 tasks) zero-shot
Average Return16
6
Unsupervised Reinforcement LearningExORL quadruped zero-shot
Average Return496
6
Goal-Conditioned Reinforcement Learning (Navigation)pointmaze-large-navigate state-based v0 (test)
Success Rate22
6
Goal-Conditioned Reinforcement Learning (Navigation)antmaze medium-navigate state-based v0 (test)
Success Rate39
6
Goal-Conditioned Reinforcement Learning (Navigation)antmaze-large-navigate state-based v0 (test)
Success Rate11
6
Goal-Conditioned Reinforcement Learning (Navigation)antsoccer-arena-navigate state-based v0 (test)
Success Rate11
6
Showing 10 of 22 rows

Other info

Follow for update