SPOT: SE(3) Pose Trajectory Diffusion for Object-Centric Manipulation
About
We introduce SPOT, an object-centric imitation learning framework. The key idea is to capture each task by an object-centric representation, specifically the SE(3) object pose trajectory relative to the target. This approach decouples embodiment actions from sensory inputs, facilitating learning from various demonstration types, including both action-based and action-less human hand demonstrations, as well as cross-embodiment generalization. Additionally, object pose trajectories inherently capture planning constraints from demonstrations without the need for manually-crafted rules. To guide the robot in executing the task, the object trajectory is used to condition a diffusion policy. We systematically evaluate our method on simulation and real-world tasks. In real-world evaluation, using only eight demonstrations shot on an iPhone, our approach completed all tasks while fully complying with task constraints. Project page: https://nvlabs.github.io/object_centric_diffusion
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 6DoF object manipulation trajectory generation | HOT3D | 3D Positional ADE1.018 | 19 | |
| 6-DOF Object Trajectory Synthesis | HD-EPIC | ADE (m)1.44 | 11 | |
| Pick-&-Place | Real-world (Unseen) | Success Rate52 | 9 | |
| Robotic Insertion | Cobot Mobile ALOHA In-distribution (train) | Task 1 Success Rate100 | 5 | |
| Pick-&-Place | RLBench Put A in B (Pose-level substitution) | Success Rate48 | 3 | |
| Pick-&-Place | RLBench Put A in B Instance-level substitution | Success Rate50.7 | 3 | |
| Pick-&-Place | RLBench Put A in B Category-level substitution | Success Rate48 | 3 | |
| Pick, Pour (L1) | Real World unknown objects | Success Rate76 | 3 | |
| Pour (Level 1) | RLBench Pour A in B (Pose-level substitution) | Success Rate68 | 3 | |
| Pour (Level 1) | RLBench Pour A in B Instance-level substitution | Success Rate64 | 3 |