Canonical Policy: Learning Canonical 3D Representation for SE(3)-Equivariant Policy
About
Visual Imitation learning has achieved remarkable progress in robotic manipulation, yet generalization to unseen objects, scene layouts, and camera viewpoints remains a key challenge. Recent advances address this by using 3D point clouds, which provide geometry-aware, appearance-invariant representations, and by incorporating equivariance into policy architectures to exploit spatial symmetries. However, existing equivariant approaches often lack interpretability and rigor due to unstructured integration of equivariant components. We introduce canonical policy, a principled framework for 3D equivariant imitation learning that unifies 3D point cloud observations under a canonical representation. We first establish a theory of 3D canonical representations, enabling equivariant observation-to-action mappings by grouping both seen and novel point clouds to a canonical representation. We then propose a flexible policy learning pipeline that leverages geometric symmetries from canonical representation and the expressiveness of modern generative models. We validate canonical policy on 12 diverse simulated tasks and 4 real-world manipulation tasks across 16 configurations, involving variations in object color, shape, camera viewpoint, and robot platform. Compared to state-of-the-art imitation learning policies, canonical policy achieves an average improvement of 18.0% in simulation and 39.7% in real-world experiments, demonstrating superior generalization capability and sample efficiency. For more details, please refer to the project website: https://zhangzhiyuanzhang.github.io/cp-website/.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Block-stacking | Real-robot Franka Emika Panda (real-world) | Success Rate30 | 3 | |
| Block-stacking | Real-world SE(3) layout variations 1.0 (test) | Success Rate30 | 3 | |
| Robotic Manipulation | 16 Simulation Benchmarks (test) | Stack D179 | 3 | |
| Shoe Alignment | Real-robot Franka Emika Panda (real-world) | Success Rate30 | 3 | |
| Table Organization | Real-robot Franka Emika Panda (real-world) | Success Rate10 | 3 | |
| Cloth Folding | Real-robot Franka Emika Panda (real-world) | Success Rate0.00e+0 | 3 | |
| Cloth Folding | Real-world SE(3) layout variations 1.0 (test) | Success Rate0.00e+0 | 3 | |
| Shoe Alignment | Real-world SE(3) layout variations 1.0 (test) | Success Rate30 | 3 | |
| Table Organization | Real-world SE(3) layout variations 1.0 (test) | Success Rate0.00e+0 | 3 |