Sparse ActionGen: Accelerating Diffusion Policy with Real-time Pruning
About
Diffusion Policy has dominated action generation due to its strong capabilities for modeling multi-modal action distributions, but its multi-step denoising processes make it impractical for real-time visuomotor control. Existing caching-based acceleration methods typically rely on $\textit{static}$ schedules that fail to adapt to the $\textit{dynamics}$ of robot-environment interactions, thereby leading to suboptimal performance. In this paper, we propose $\underline{\textbf{S}}$parse $\underline{\textbf{A}}$ction$\underline{\textbf{G}}$en ($\textbf{SAG}$) for extremely sparse action generation. To accommodate the iterative interactions, SAG customizes a rollout-adaptive prune-then-reuse mechanism that first identifies prunable computations globally and then reuses cached activations to substitute them during action diffusion. To capture the rollout dynamics, SAG parameterizes an observation-conditioned diffusion pruner for environment-aware adaptation and instantiates it with a highly parameter- and inference-efficient design for real-time prediction. Furthermore, SAG introduces a one-for-all reusing strategy that reuses activations across both timesteps and blocks in a zig-zag manner, minimizing the global redundancy. Extensive experiments on multiple robotic benchmarks demonstrate that SAG achieves up to 4$\times$ generation speedup without sacrificing performance. Project Page: https://sparse-actiongen.github.io/.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Robot Manipulation | Proficient Human (PH) Square | Success Rate89 | 9 | |
| Robot Manipulation | Proficient Human (PH) Transport | Success Rate85 | 9 | |
| Square | Mixed Human demonstration data | Success Rate0.79 | 9 | |
| Can | Mixed Human (MH) demonstration data | Success Rate0.94 | 9 | |
| Robot Manipulation | Proficient Human (PH) Can | Success Rate98 | 9 | |
| Robot Manipulation | Proficient Human (PH) Tool | Success Rate50 | 9 | |
| Transport | Mixed Human (MH) | Success Rate50 | 9 | |
| Multi-stage Robotic Manipulation | Kitchen (test) | Success Rate (Kit_p1)100 | 9 | |
| Lift | Mixed Human (MH) demonstration data | Success Rate100 | 9 | |
| Robot Manipulation | Proficient Human (PH) Lift | Success Rate1 | 9 |