Temporal Representation Alignment: Successor Features Enable Emergent Compositionality in Robot Instruction Following

About

Effective task representations should facilitate compositionality, such that after learning a variety of basic tasks, an agent can perform compound tasks consisting of multiple steps simply by composing the representations of the constituent steps together. While this is conceptually simple and appealing, it is not clear how to automatically learn representations that enable this sort of compositionality. We show that learning to associate the representations of current and future states with a temporal alignment loss can improve compositional generalization, even in the absence of any explicit subtask planning or reinforcement learning. We evaluate our approach across diverse robotic manipulation tasks as well as in simulation, showing substantial improvements for tasks specified with either language or goal images.

Vivek Myers, Bill Chunyuan Zheng, Anca Dragan, Kuan Fang, Sergey Levine• 2025

Related benchmarks

Task	Dataset	Result
Offline Reinforcement Learning	D4RL Franka Kitchen	Mixed Success Rate85	43
Robotic Manipulation	D4RL Kitchen-Partial	Normalized Score100	23
Goal-conditioned Reinforcement Learning	antmaze stitch medium	Success Rate54	23
Goal-conditioned Reinforcement Learning	antmaze stitch large	Success Rate17	23
Goal-conditioned Reinforcement Learning	manipulation scene-play	Success Rate16	14
Goal-conditioned Reinforcement Learning	humanoidmaze stitch medium	Success Rate45	14
Goal-conditioned Reinforcement Learning	humanoidmaze stitch large	Success Rate5	14
Goal-conditioned Reinforcement Learning	antsoccer stitch arena	Success Rate14	14
Robotic Manipulation	D4RL Kitchen-Mixed	--	14
Goal-conditioned Reinforcement Learning	manipulation-cube-single-play (test)	Success Rate0.4	11

Showing 10 of 25 rows

Other info

Follow for update

@wizwand_team Discord