Do You Know Where Your Camera Is? View-Invariant Policy Learning with Camera Conditioning

About

We study view-invariant imitation learning by explicitly conditioning policies on camera extrinsics. Using Plucker embeddings of per-pixel rays, we show that conditioning on extrinsics significantly improves generalization across viewpoints for standard behavior cloning policies, including ACT, Diffusion Policy, and SmolVLA. To evaluate policy robustness under realistic viewpoint shifts, we introduce six manipulation tasks in RoboSuite and ManiSkill that pair "fixed" and "randomized" scene variants, decoupling background cues from camera pose. Our analysis reveals that policies without extrinsics often infer camera pose using visual cues from static backgrounds in fixed scenes; this shortcut collapses when workspace geometry or camera placement shifts. Conditioning on extrinsics restores performance and yields robust RGB-only control without depth. We release the tasks, demonstrations, and code at https://ripl.github.io/know_your_camera/ .

Tianchong Jiang, Jingtian Ji, Xiangshan Tan, Jiading Fang, Anand Bhattad, Vitor Guizilini, Matthew R. Walter• 2025

Related benchmarks

Task	Dataset	Result
Coffee	Robosuite Seen views	Success Rate10.7	9
Stack Three	Robosuite Seen views	Success Rate1.9	9
Coffee	Robosuite Unseen views	Success Rate0.00e+0	9
Mug Cleanup	Robosuite Seen views	Success Rate4	9
Mug Cleanup	Robosuite Unseen views	Success Rate0.00e+0	9
Square	Robosuite Seen views	Success Rate14	9
Stack Three	Robosuite Unseen views	Success Rate0.00e+0	9
Lift	Robosuite Unseen views	Success Rate46	9
Lift	Robosuite Seen views	Success Rate72	9
Square	Robosuite Unseen views	Success Rate0.00e+0	9

Showing 10 of 14 rows

Other info

Follow for update

@wizwand_team Discord