MonoCD: Monocular 3D Object Detection with Complementary Depths
About
Monocular 3D object detection has attracted widespread attention due to its potential to accurately obtain object 3D localization from a single image at a low cost. Depth estimation is an essential but challenging subtask of monocular 3D object detection due to the ill-posedness of 2D to 3D mapping. Many methods explore multiple local depth clues such as object heights and keypoints and then formulate the object depth estimation as an ensemble of multiple depth predictions to mitigate the insufficiency of single-depth information. However, the errors of existing multiple depths tend to have the same sign, which hinders them from neutralizing each other and limits the overall accuracy of combined depth. To alleviate this problem, we propose to increase the complementarity of depths with two novel designs. First, we add a new depth prediction branch named complementary depth that utilizes global and efficient depth clues from the entire image rather than the local clues to reduce the correlation of depth predictions. Second, we propose to fully exploit the geometric relations between multiple depth clues to achieve complementarity in form. Benefiting from these designs, our method achieves higher complementarity. Experiments on the KITTI benchmark demonstrate that our method achieves state-of-the-art performance without introducing extra data. In addition, complementary depth can also be a lightweight and plug-and-play module to boost multiple existing monocular 3d object detectors. Code is available at https://github.com/elvintanhust/MonoCD.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Object Detection | KITTI car (test) | AP3D (Easy)25.53 | 195 | |
| 3D Object Detection | KITTI car (val) | AP 3D Easy26.45 | 62 | |
| Bird's Eye View Object Detection (Car) | KITTI (test) | APBEV (Easy) @IoU=0.733.41 | 59 | |
| Bird's Eye View (BEV) Detection | KITTI Cars (IoU3D ≥ 0.7) (test) | APBEV R40 (Easy)33.41 | 52 | |
| 3D Object Detection | KITTI (test) | 3D AP (Easy)25.53 | 43 | |
| Monocular 3D Object Detection | KITTI (test) | AP3D R40 (Mod.)16.59 | 38 | |
| Monocular 3D Object Detection | KITTI car category (val) | AP 3D (R40)19.37 | 37 | |
| Monocular 3D Object Detection | Waymo Open Dataset 79 (val) | AP@0.5 (3D, L1)1.16e+3 | 24 | |
| 3D Object Detection | KITTI Car category IoU=0.7 (test) | AP3D R40 (Easy)25.53 | 21 | |
| Bird's eye view object detection | KITTI car (val) | APBEV R40 Easy34.6 | 20 |