Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Do You Know Where Your Camera Is? View-Invariant Policy Learning with Camera Conditioning

About

We study view-invariant imitation learning by explicitly conditioning policies on camera extrinsics. Using Plucker embeddings of per-pixel rays, we show that conditioning on extrinsics significantly improves generalization across viewpoints for standard behavior cloning policies, including ACT, Diffusion Policy, and SmolVLA. To evaluate policy robustness under realistic viewpoint shifts, we introduce six manipulation tasks in RoboSuite and ManiSkill that pair "fixed" and "randomized" scene variants, decoupling background cues from camera pose. Our analysis reveals that policies without extrinsics often infer camera pose using visual cues from static backgrounds in fixed scenes; this shortcut collapses when workspace geometry or camera placement shifts. Conditioning on extrinsics restores performance and yields robust RGB-only control without depth. We release the tasks, demonstrations, and code at https://ripl.github.io/know_your_camera/ .

Tianchong Jiang, Jingtian Ji, Xiangshan Tan, Jiading Fang, Anand Bhattad, Vitor Guizilini, Matthew R. Walter• 2025

Related benchmarks

TaskDatasetResultRank
CoffeeRobosuite Seen views
Success Rate10.7
9
Stack ThreeRobosuite Seen views
Success Rate1.9
9
CoffeeRobosuite Unseen views
Success Rate0.00e+0
9
Mug CleanupRobosuite Seen views
Success Rate4
9
Mug CleanupRobosuite Unseen views
Success Rate0.00e+0
9
SquareRobosuite Seen views
Success Rate14
9
Stack ThreeRobosuite Unseen views
Success Rate0.00e+0
9
LiftRobosuite Unseen views
Success Rate46
9
LiftRobosuite Seen views
Success Rate72
9
SquareRobosuite Unseen views
Success Rate0.00e+0
9
Showing 10 of 14 rows

Other info

Follow for update