DeCo: Task Decomposition and Skill Composition for Zero-Shot Generalization in Long-Horizon 3D Manipulation

About

Generalizing language-conditioned multi-task imitation learning (IL) models to novel long-horizon 3D manipulation tasks is challenging. To address this, we propose DeCo (Task Decomposition and Skill Composition), a model-agnostic framework that enhances zero-shot generalization to compositional long-horizon manipulation tasks. DeCo decomposes IL demonstrations into modular atomic tasks based on gripper-object interactions, creating a dataset that enables models to learn reusable skills. At inference, DeCo uses a vision-language model (VLM) to parse high-level instructions, retrieve relevant skills, and dynamically schedule their execution. A spatially-aware skill-chaining module ensures smooth, collision-free transitions between skills. We introduce DeCoBench, a benchmark designed to evaluate compositional generalization in long-horizon manipulation tasks. DeCo improves the success rate of three IL models, RVT-2, 3DDA, and ARP, by 66.67%, 21.53%, and 57.92%, respectively, on 12 novel tasks. In real-world experiments, the DeCo-enhanced model, trained on only 6 atomic tasks, completes 9 novel tasks in zero-shot, with a 53.33% improvement over the baseline model. Project website: https://deco226.github.io.

Zixuan Chen, Junhui Yin, Yangtao Chen, Jing Huo, Pinzhuo Tian, Jieqi Shi, Yiwen Hou, Yinchuan Li, Yang Gao• 2025

Related benchmarks

Task	Dataset	Result
Long-horizon Robotic Task Execution	DeCoBench Novel long-horizon tasks	Average Success Rate6.67e+3	6
Robotic Planning	Tidy_House	Success Rate46.67	6
Robotic Planning	Prepare_Groceries	Success Rate35	6
Robotic Planning	Set Table	Success Rate23.67	6
3D Manipulation	Real-world Novel Long-Horizon Tasks	Avg. Success Rate53.33	2
3D Manipulation	Real-world Atomic Tasks	Average Success Rate88.33	2

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord