SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation
About
Directly regressing all 6 degrees-of-freedom (6DoF) for the object pose (e.g. the 3D rotation and translation) in a cluttered environment from a single RGB image is a challenging problem. While end-to-end methods have recently demonstrated promising results at high efficiency, they are still inferior when compared with elaborate P$n$P/RANSAC-based approaches in terms of pose accuracy. In this work, we address this shortcoming by means of a novel reasoning about self-occlusion, in order to establish a two-layer representation for 3D objects which considerably enhances the accuracy of end-to-end 6D pose estimation. Our framework, named SO-Pose, takes a single RGB image as input and respectively generates 2D-3D correspondences as well as self-occlusion information harnessing a shared encoder and two separate decoders. Both outputs are then fused to directly regress the 6DoF pose parameters. Incorporating cross-layer consistencies that align correspondences, self-occlusion and 6D pose, we can further improve accuracy and robustness, surpassing or rivaling all other state-of-the-art approaches on various challenging datasets.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 6D Pose Estimation | YCB-Video | AUC (ADD-S)90.9 | 148 | |
| 6DoF Pose Estimation | YCB-Video (test) | -- | 72 | |
| 6D Object Pose Estimation | OccludedLINEMOD (test) | ADD(S)62.3 | 45 | |
| 6D Object Pose Estimation | LM-O (test) | -- | 22 | |
| Object Pose Estimation | LineMod (test) | -- | 21 | |
| 6D Object Pose Estimation | BOP (T-LESS, ITODD, YCB-V, LM-O) Challenge (test) | LM-O Score61.3 | 13 | |
| 6D Pose Estimation | YCB-V | AR VSD65.2 | 5 | |
| 6-DoF Pose Estimation | BOP LINEMOD, Occlusion LINEMOD, YCB-Video | AR VSD (LMO)44 | 5 | |
| 6D Pose Estimation | LMO | AR VSD44.2 | 5 | |
| 6D Pose Estimation | LMO and YCB-V | Mean AR66.4 | 4 |