Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Learning Geometrically-Grounded 3D Visual Representations for View-Generalizable Robotic Manipulation

About

Real-world robotic manipulation demands visuomotor policies capable of robust spatial scene understanding and strong generalization across diverse camera viewpoints. While recent advances in 3D-aware visual representations have shown promise, they still suffer from several key limitations, including reliance on multi-view observations during inference which is impractical in single-view restricted scenarios, incomplete scene modeling that fails to capture holistic and fine-grained geometric structures essential for precise manipulation, and lack of effective policy training strategies to retain and exploit the acquired 3D knowledge. To address these challenges, we present MethodName, a unified representation-policy learning framework for view-generalizable robotic manipulation. MethodName introduces a single-view 3D pretraining paradigm that leverages point cloud reconstruction and feed-forward gaussian splatting under multi-view supervision to learn holistic geometric representations. During policy learning, MethodName performs multi-step distillation to preserve the pretrained geometric understanding and effectively transfer it to manipulation skills. We conduct experiments on 12 RLBench tasks, where our approach outperforms the previous state-of-the-art method by 12.7% in average success rate. Further evaluation on six representative tasks demonstrates strong zero-shot view generalization, with success rate drops of only 22.0% and 29.7% under moderate and large viewpoint shifts respectively, whereas the state-of-the-art method suffers larger decreases of 41.6% and 51.5%.

Di Zhang, Weicheng Duan, Dasen Gu, Hongye Lu, Hai Zhang, Hang Yu, Junqiao Zhao, Guang Chen• 2026

Related benchmarks

TaskDatasetResultRank
Robotic ManipulationRLBench (test)
Average Success Rate44.2
34
Robot ManipulationRLBench Moderate Shift
Average Success Rate41.2
11
Robot ManipulationRLBench Large Shift
Rel. Drop (Avg)3.7
10
Robot ManipulationRLBench Large Shift (test)
Average SR39.6
8
Robotic ManipulationRLBench Moderate Shift (test)
Success Rate45.9
4
Robot ManipulationRLBench View (train)
SR (Avg)52.9
3
Robotic ManipulationRLBench Multi-task Train View
Relative Performance Drop1.7
3
Robotic ManipulationRLBench Multi-task Moderate Shift
Relative Performance Drop6.9
3
Showing 8 of 8 rows

Other info

Follow for update