Cov2Pose: Leveraging Spatial Covariance for Direct Manifold-aware 6-DoF Object Pose Estimation
About
In this paper, we address the problem of 6-DoF object pose estimation from a single RGB image. Indirect methods that typically predict intermediate 2D keypoints, followed by a Perspective-n-Point solver, have shown great performance. Direct approaches, which regress the pose in an end-to-end manner, are usually computationally more efficient but less accurate. However, direct pose regression heads rely on globally pooled features, ignoring spatial second-order statistics despite their informativeness in pose prediction. They also predict, in most cases, discontinuous pose representations that lack robustness. Herein, we therefore propose a covariance-pooled representation that encodes convolutional feature distributions as a symmetric positive definite (SPD) matrix. Moreover, we propose a novel pose encoding in the form of an SPD matrix via its Cholesky decomposition. Pose is then regressed in an end-to-end manner with a manifold-aware network head, taking into account the Riemannian geometry of SPD matrices. Experiments and ablations consistently demonstrate the relevance of second-order pooling and continuous representations for direct pose regression, including under partial occlusion.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 6D Pose Estimation | YCB-V | AUC (ADD-S)90 | 27 | |
| Spacecraft Pose Estimation | SPEED+ Lightbox | Translation Error (m)0.3 | 19 | |
| Spacecraft Pose Estimation | SPEED+ Sunlamp | ET (m)0.43 | 19 | |
| Camera Pose Regression | Cambridge Landmarks (test) | Translation Error (Kings College, Median, m)1.57 | 16 | |
| 6DoF Pose Estimation | Occlusion Linemod (Part I) | Average Error76.8 | 16 | |
| 6-DoF Pose Estimation | LM | ADD(-S) (Avg)97.2 | 9 | |
| Spacecraft Pose Estimation | SPEED+ synthetic | Translation Error (m)0.184 | 1 |