Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning

About

While goal-conditioned behavior cloning (GCBC) methods can perform well on in-distribution training tasks, they do not necessarily generalize zero-shot to tasks that require conditioning on novel state-goal pairs, i.e. combinatorial generalization. In part, this limitation can be attributed to a lack of temporal consistency in the state representation learned by BC; if temporally correlated states are properly encoded to similar latent representations, then the out-of-distribution gap for novel state-goal pairs would be reduced. We formalize this notion by demonstrating how encouraging long-range temporal consistency via successor representations (SR) can facilitate generalization. We then propose a simple yet effective representation learning objective, $\text{BYOL-}\gamma$ for GCBC, which theoretically approximates the successor representation in the finite MDP case through self-predictive representations, and achieves competitive empirical performance across a suite of challenging tasks requiring combinatorial generalization.

Daniel Lawson, Adriana Hugessen, Charlotte Cloutier, Glen Berseth, Khimya Khetarpal• 2025

Related benchmarks

Task	Dataset	Result
Offline Reinforcement Learning	D4RL Franka Kitchen	Mixed Success Rate69	43
Goal-conditioned Reinforcement Learning	antmaze stitch medium	Success Rate68	23
Goal-conditioned Reinforcement Learning	antmaze stitch large	Success Rate26	23
Robotic Manipulation	D4RL Kitchen-Partial	Normalized Score75	23
Goal-conditioned Reinforcement Learning	antsoccer stitch arena	Success Rate25	14
Goal-conditioned Reinforcement Learning	manipulation scene-play	Success Rate17	14
Goal-conditioned Reinforcement Learning	humanoidmaze stitch medium	Success Rate51	14
Goal-conditioned Reinforcement Learning	humanoidmaze stitch large	Success Rate13	14
Robotic Manipulation	D4RL Kitchen-Mixed	--	14
Goal-conditioned Reinforcement Learning	manipulation-cube-single-play (test)	Success Rate0.51	11

Showing 10 of 39 rows

Other info

Follow for update

@wizwand_team Discord