Diffusion Policy: Visuomotor Policy Learning via Action Diffusion
About
This paper introduces Diffusion Policy, a new way of generating robot behavior by representing a robot's visuomotor policy as a conditional denoising diffusion process. We benchmark Diffusion Policy across 12 different tasks from 4 different robot manipulation benchmarks and find that it consistently outperforms existing state-of-the-art robot learning methods with an average improvement of 46.9%. Diffusion Policy learns the gradient of the action-distribution score function and iteratively optimizes with respect to this gradient field during inference via a series of stochastic Langevin dynamics steps. We find that the diffusion formulation yields powerful advantages when used for robot policies, including gracefully handling multimodal action distributions, being suitable for high-dimensional action spaces, and exhibiting impressive training stability. To fully unlock the potential of diffusion models for visuomotor policy learning on physical robots, this paper presents a set of key technical contributions including the incorporation of receding horizon control, visual conditioning, and the time-series diffusion transformer. We hope this work will help motivate a new generation of policy learning techniques that are able to leverage the powerful generative modeling capabilities of diffusion models. Code, data, and training details is publicly available diffusion-policy.cs.columbia.edu
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Robot Manipulation | LIBERO | Goal Achievement73.5 | 494 | |
| Robot Manipulation | LIBERO (test) | Average Success Rate76.1 | 142 | |
| Long-horizon robot manipulation | Calvin ABCD→D | Task 1 Completion Rate86.3 | 96 | |
| Long-horizon task completion | Calvin ABC->D | Success Rate (1)40.2 | 67 | |
| Closed-loop Planning | nuPlan 14 (val) | NR Score84.27 | 66 | |
| Closed-loop Planning | nuPlan 14 Hard (test) | NR69.7 | 64 | |
| Closed-loop Planning | nuPlan 14 (test) | NR85.62 | 45 | |
| Robot Manipulation | Calvin ABC->D | Average Successful Length0.56 | 36 | |
| Robotic Manipulation | RLBench (test) | Average Success Rate45.6 | 34 | |
| Robotic Manipulation | LIBERO 1.0 (test) | Long50.5 | 30 |