CAPE: Context-Aware Diffusion Policy Via Proximal Mode Expansion for Collision Avoidance
About
In robotics, diffusion models can capture multi-modal trajectories from demonstrations, making them a transformative approach in imitation learning. However, achieving optimal performance following this regiment requires a large-scale dataset, which is costly to obtain, especially for challenging tasks, such as collision avoidance. In those tasks, generalization at test time demands coverage of many obstacles types and their spatial configurations, which are impractical to acquire purely via data. To remedy this problem, we propose Context-Aware diffusion policy via Proximal mode Expansion (CAPE), a framework that expands trajectory distribution modes with context-aware prior and guidance at inference via a novel prior-seeded iterative guided refinement procedure. The framework generates an initial trajectory plan and executes a short prefix trajectory, and then the remaining trajectory segment is perturbed to an intermediate noise level, forming a trajectory prior. Such a prior is context-aware and preserves task intent. Repeating the process with context-aware guided denoising iteratively expands mode support to allow finding smoother, less collision-prone trajectories. For collision avoidance, CAPE expands trajectory distribution modes with collision-aware context, enabling the sampling of collision-free trajectories in previously unseen environments while maintaining goal consistency. We evaluate CAPE on diverse manipulation tasks in cluttered unseen simulated and real-world settings and show up to 26% and 80% higher success rates respectively compared to SOTA methods, demonstrating better generalization to unseen environments.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Collision Avoidance | ENV1: CONCEPT Full observation | Success Rate0.94 | 4 | |
| Collision Avoidance | ENV2 EASY (Full observation) | Success Rate (SR)98 | 4 | |
| Collision Avoidance | ENV3 MEDIUM Full observation | SR82 | 4 | |
| Collision Avoidance | ENV4 HARD (Full observation) | Success Rate0.75 | 4 | |
| Collision Avoidance | ENV3 MEDIUM (Limited observation) | Success Rate76 | 4 | |
| Pick Tape | Real-world environment | Success Rate80 | 3 | |
| Pick and Place Cup | Real-world environment | SR100 | 3 |