OrbitGrasp: $SE(3)$-Equivariant Grasp Learning
About
While grasp detection is an important part of any robotic manipulation pipeline, reliable and accurate grasp detection in $SE(3)$ remains a research challenge. Many robotics applications in unstructured environments such as the home or warehouse would benefit a lot from better grasp performance. This paper proposes a novel framework for detecting $SE(3)$ grasp poses based on point cloud input. Our main contribution is to propose an $SE(3)$-equivariant model that maps each point in the cloud to a continuous grasp quality function over the 2-sphere $S^2$ using spherical harmonic basis functions. Compared with reasoning about a finite set of samples, this formulation improves the accuracy and efficiency of our model when a large number of samples would otherwise be needed. In order to accomplish this, we propose a novel variation on EquiFormerV2 that leverages a UNet-style encoder-decoder architecture to enlarge the number of points the model can handle. Our resulting method, which we name $\textit{OrbitGrasp}$, significantly outperforms baselines in both simulation and physical experiments.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Clutter removal | Pile scenes single-view, fixed camera, gamma noise | GSR69.3 | 16 | |
| Clutter removal | Packed scenes single-view, fixed camera, gamma noise | GSR71.1 | 16 | |
| Clutter removal | Packed single-view, random camera pose, Gaussian noise | GSR98.1 | 10 | |
| Clutter removal | Pile single-view, random camera pose, Gaussian noise | GSR91.6 | 10 |