Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

VERM: Leveraging Foundation Models to Create a Virtual Eye for Efficient 3D Robotic Manipulation

About

When performing 3D manipulation tasks, robots have to execute action planning based on perceptions from multiple fixed cameras. The multi-camera setup introduces substantial redundancy and irrelevant information, which increases computational costs and forces the model to spend extra training time extracting crucial task-relevant details. To filter out redundant information and accurately extract task-relevant features, we propose the VERM (Virtual Eye for Robotic Manipulation) method, leveraging the knowledge in foundation models to imagine a virtual task-adaptive view from the constructed 3D point cloud, which efficiently captures necessary information and mitigates occlusion. To facilitate 3D action planning and fine-grained manipulation, we further design a depth-aware module and a dynamic coarse-to-fine procedure. Extensive experimental results on both simulation benchmark RLBench and real-world evaluations demonstrate the effectiveness of our method, surpassing previous state-of-the-art methods while achieving 1.89x speedup in training time and 1.54x speedup in inference speed. More results can be found on our project website at https://verm-ral.github.io .

Yixiang Chen, Yan Huang, Keji He, Peiyan Li, Liang Wang• 2025

Related benchmarks

TaskDatasetResultRank
Multi-task Robotic ManipulationRLBench
Avg Success Rate83.6
16
Close DrawerReal-world Evaluation 1.0 (unseen object placements)
Success Rate100
4
Flip CupReal-world Evaluation unseen object placements 1.0
Success Rate80
4
Multi-task Robotic ManipulationReal-world Evaluation unseen object placements 1.0
Success Rate80
4
Open cabinetReal-world Evaluation unseen object placements 1.0
Success Rate80
4
Press sanitizerReal-world Evaluation unseen object placements 1.0
Success Rate90
4
Put block in bowlReal-world Evaluation 1.0 (unseen object placements)
Success Rate0.8
4
Put object in drawerReal-world Evaluation unseen object placements 1.0
Success Rate0.7
4
Put object in shelfReal-world Evaluation unseen object placements 1.0
Success Rate80
4
stack blocksReal-world Evaluation unseen object placements 1.0
Success Rate0.8
4
Showing 10 of 10 rows

Other info

Follow for update