SPOT: SE(3) Pose Trajectory Diffusion for Object-Centric Manipulation

About

We introduce SPOT, an object-centric imitation learning framework. The key idea is to capture each task by an object-centric representation, specifically the SE(3) object pose trajectory relative to the target. This approach decouples embodiment actions from sensory inputs, facilitating learning from various demonstration types, including both action-based and action-less human hand demonstrations, as well as cross-embodiment generalization. Additionally, object pose trajectories inherently capture planning constraints from demonstrations without the need for manually-crafted rules. To guide the robot in executing the task, the object trajectory is used to condition a diffusion policy. We systematically evaluate our method on simulation and real-world tasks. In real-world evaluation, using only eight demonstrations shot on an iPhone, our approach completed all tasks while fully complying with task constraints. Project page: https://nvlabs.github.io/object_centric_diffusion

Cheng-Chun Hsu, Bowen Wen, Jie Xu, Yashraj Narang, Xiaolong Wang, Yuke Zhu, Joydeep Biswas, Stan Birchfield• 2024

Related benchmarks

Task	Dataset	Result
6DoF object manipulation trajectory generation	HOT3D	3D Positional ADE1.018	19
6-DOF Object Trajectory Synthesis	HD-EPIC	ADE (m)1.44	11
Pick-&-Place	Real-world (Unseen)	Success Rate52	9
Robotic Insertion	Cobot Mobile ALOHA In-distribution (train)	Task 1 Success Rate100	5
Pick-&-Place	RLBench Put A in B (Pose-level substitution)	Success Rate48	3
Pick-&-Place	RLBench Put A in B Instance-level substitution	Success Rate50.7	3
Pick-&-Place	RLBench Put A in B Category-level substitution	Success Rate48	3
Pick, Pour (L1)	Real World unknown objects	Success Rate76	3
Pour (Level 1)	RLBench Pour A in B (Pose-level substitution)	Success Rate68	3
Pour (Level 1)	RLBench Pour A in B Instance-level substitution	Success Rate64	3

Showing 10 of 15 rows

Other info

Follow for update

@wizwand_team Discord