DeCo: Task Decomposition and Skill Composition for Zero-Shot Generalization in Long-Horizon 3D Manipulation
About
Generalizing language-conditioned multi-task imitation learning (IL) models to novel long-horizon 3D manipulation tasks is challenging. To address this, we propose DeCo (Task Decomposition and Skill Composition), a model-agnostic framework that enhances zero-shot generalization to compositional long-horizon manipulation tasks. DeCo decomposes IL demonstrations into modular atomic tasks based on gripper-object interactions, creating a dataset that enables models to learn reusable skills. At inference, DeCo uses a vision-language model (VLM) to parse high-level instructions, retrieve relevant skills, and dynamically schedule their execution. A spatially-aware skill-chaining module ensures smooth, collision-free transitions between skills. We introduce DeCoBench, a benchmark designed to evaluate compositional generalization in long-horizon manipulation tasks. DeCo improves the success rate of three IL models, RVT-2, 3DDA, and ARP, by 66.67%, 21.53%, and 57.92%, respectively, on 12 novel tasks. In real-world experiments, the DeCo-enhanced model, trained on only 6 atomic tasks, completes 9 novel tasks in zero-shot, with a 53.33% improvement over the baseline model. Project website: https://deco226.github.io.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Long-horizon Robotic Task Execution | DeCoBench Novel long-horizon tasks | Average Success Rate6.67e+3 | 6 | |
| Robotic Planning | Tidy_House | Success Rate46.67 | 6 | |
| Robotic Planning | Prepare_Groceries | Success Rate35 | 6 | |
| Robotic Planning | Set Table | Success Rate23.67 | 6 | |
| 3D Manipulation | Real-world Novel Long-Horizon Tasks | Avg. Success Rate53.33 | 2 | |
| 3D Manipulation | Real-world Atomic Tasks | Average Success Rate88.33 | 2 |