Learning Geometrically-Grounded 3D Visual Representations for View-Generalizable Robotic Manipulation

About

Real-world robotic manipulation demands visuomotor policies capable of robust spatial scene understanding and strong generalization across diverse camera viewpoints. While recent advances in 3D-aware visual representations have shown promise, they still suffer from several key limitations, including reliance on multi-view observations during inference which is impractical in single-view restricted scenarios, incomplete scene modeling that fails to capture holistic and fine-grained geometric structures essential for precise manipulation, and lack of effective policy training strategies to retain and exploit the acquired 3D knowledge. To address these challenges, we present MethodName, a unified representation-policy learning framework for view-generalizable robotic manipulation. MethodName introduces a single-view 3D pretraining paradigm that leverages point cloud reconstruction and feed-forward gaussian splatting under multi-view supervision to learn holistic geometric representations. During policy learning, MethodName performs multi-step distillation to preserve the pretrained geometric understanding and effectively transfer it to manipulation skills. We conduct experiments on 12 RLBench tasks, where our approach outperforms the previous state-of-the-art method by 12.7% in average success rate. Further evaluation on six representative tasks demonstrates strong zero-shot view generalization, with success rate drops of only 22.0% and 29.7% under moderate and large viewpoint shifts respectively, whereas the state-of-the-art method suffers larger decreases of 41.6% and 51.5%.

Di Zhang, Weicheng Duan, Dasen Gu, Hongye Lu, Hai Zhang, Hang Yu, Junqiao Zhao, Guang Chen• 2026

Related benchmarks

Task	Dataset	Result
Robotic Manipulation	RLBench (test)	Average Success Rate44.2	49
Robot Manipulation	RLBench Moderate Shift	Average Success Rate41.2	11
Robot Manipulation	RLBench Large Shift	Rel. Drop (Avg)3.7	10
Robot Manipulation	RLBench Large Shift (test)	Average SR39.6	8
Robotic Manipulation	RLBench Moderate Shift (test)	Success Rate45.9	4
Robot Manipulation	RLBench View (train)	SR (Avg)52.9	3
Robotic Manipulation	RLBench Multi-task Train View	Relative Performance Drop1.7	3
Robotic Manipulation	RLBench Multi-task Moderate Shift	Relative Performance Drop6.9	3

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord