Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MonoCD: Monocular 3D Object Detection with Complementary Depths

About

Monocular 3D object detection has attracted widespread attention due to its potential to accurately obtain object 3D localization from a single image at a low cost. Depth estimation is an essential but challenging subtask of monocular 3D object detection due to the ill-posedness of 2D to 3D mapping. Many methods explore multiple local depth clues such as object heights and keypoints and then formulate the object depth estimation as an ensemble of multiple depth predictions to mitigate the insufficiency of single-depth information. However, the errors of existing multiple depths tend to have the same sign, which hinders them from neutralizing each other and limits the overall accuracy of combined depth. To alleviate this problem, we propose to increase the complementarity of depths with two novel designs. First, we add a new depth prediction branch named complementary depth that utilizes global and efficient depth clues from the entire image rather than the local clues to reduce the correlation of depth predictions. Second, we propose to fully exploit the geometric relations between multiple depth clues to achieve complementarity in form. Benefiting from these designs, our method achieves higher complementarity. Experiments on the KITTI benchmark demonstrate that our method achieves state-of-the-art performance without introducing extra data. In addition, complementary depth can also be a lightweight and plug-and-play module to boost multiple existing monocular 3d object detectors. Code is available at https://github.com/elvintanhust/MonoCD.

Longfei Yan, Pei Yan, Shengzhou Xiong, Xuanyu Xiang, Yihua Tan• 2024

Related benchmarks

TaskDatasetResultRank
3D Object DetectionKITTI car (test)
AP3D (Easy)25.53
195
3D Object DetectionKITTI car (val)
AP 3D Easy26.45
62
Bird's Eye View Object Detection (Car)KITTI (test)
APBEV (Easy) @IoU=0.733.41
59
Bird's Eye View (BEV) DetectionKITTI Cars (IoU3D ≥ 0.7) (test)
APBEV R40 (Easy)33.41
52
3D Object DetectionKITTI (test)
3D AP (Easy)25.53
43
Monocular 3D Object DetectionKITTI (test)
AP3D R40 (Mod.)16.59
38
Monocular 3D Object DetectionKITTI car category (val)
AP 3D (R40)19.37
37
Monocular 3D Object DetectionWaymo Open Dataset 79 (val)
AP@0.5 (3D, L1)1.16e+3
24
3D Object DetectionKITTI Car category IoU=0.7 (test)
AP3D R40 (Easy)25.53
21
Bird's eye view object detectionKITTI car (val)
APBEV R40 Easy34.6
20
Showing 10 of 14 rows

Other info

Code

Follow for update