OSOP: A Multi-Stage One Shot Object Pose Estimation Framework
About
We present a novel one-shot method for object detection and 6 DoF pose estimation, that does not require training on target objects. At test time, it takes as input a target image and a textured 3D query model. The core idea is to represent a 3D model with a number of 2D templates rendered from different viewpoints. This enables CNN-based direct dense feature extraction and matching. The object is first localized in 2D, then its approximate viewpoint is estimated, followed by dense 2D-3D correspondence prediction. The final pose is computed with PnP. We evaluate the method on LineMOD, Occlusion, Homebrewed, YCB-V and TLESS datasets and report very competitive performance in comparison to the state-of-the-art methods trained on synthetic data, even though our method is not trained on the object models used for testing.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 6D Object Pose Estimation | BOP 7 core datasets: LM-O, T-LESS, TUD-L, IC-BIN, ITODD, HB, YCB-V 82 (test) | AR (LM-O)31.2 | 47 | |
| Pose Estimation | BOP benchmark 2019 (test) | LM-O AR48.2 | 43 | |
| 6D Pose Estimation | BOP challenge | LM-O48.2 | 39 | |
| 6-DoF Pose Estimation | YCB-V BOP challenge 2020 | AR57.2 | 37 | |
| 6D Pose Estimation | Homebrewed BOP challenge (test) | Avg Recall60.5 | 20 | |
| 6D Pose Estimation | Occlusion dataset BOP challenge (test) | AR48.2 | 19 | |
| 6-DoF Pose Estimation | Linemod RGB synthetic 11 (train) | ADD39.3 | 8 | |
| 6-DoF Pose Estimation | Linemod RGBD synthetic 11 (train) | ADD73.3 | 7 | |
| 2D Object Detection | LM BOP 14 (test) | Precision47 | 3 | |
| 2D Object Detection | LMO BOP 14 (test) | Precision31 | 3 |