Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

OCRA: Object-Centric Learning with 3D and Tactile Priors for Human-to-Robot Action Transfer

About

We present OCRA, an Object-Centric framework for video-based human-to-Robot Action transfer that learns directly from human demonstration videos to enable robust manipulation. Object-centric learning emphasizes task-relevant objects and their interactions while filtering out irrelevant background, providing a natural and scalable way to teach robots. OCRA leverages multi-view RGB videos, the state-of-the-art 3D foundation model VGGT, and advanced detection and segmentation models to reconstruct object-centric 3D point clouds, capturing rich interactions between objects. To handle properties not easily perceived by vision alone, we incorporate tactile priors via a large-scale dataset of over one million tactile images. These 3D and tactile priors are fused through a multimodal module (ResFiLM) and fed into a Diffusion Policy to generate robust manipulation actions. Extensive experiments on both vision-only and visuo-tactile tasks show that OCRA significantly outperforms existing baselines and ablations, demonstrating its effectiveness for learning from human demonstration videos.

Kuanning Wang, Ke Fan, Yuqian Fu, Siyu Lin, Hu Luo, Daniel Seita, Yanwei Fu, Yu-Gang Jiang, Xiangyang Xue• 2026

Related benchmarks

TaskDatasetResultRank
PourReal-world Pouring Water
Success Rate70
3
ScoopReal-world Scooping Ball
Success Rate70
3
StackReal-world Stacking Cup
Success Rate1
3
SweepReal-world Sweeping Objects
Success Rate100
3
Showing 4 of 4 rows

Other info

Follow for update