Is Pseudo-Lidar needed for Monocular 3D Object detection?

About

Recent progress in 3D object detection from single images leverages monocular depth estimation as a way to produce 3D pointclouds, turning cameras into pseudo-lidar sensors. These two-stage detectors improve with the accuracy of the intermediate depth estimation network, which can itself be improved without manual labels via large-scale self-supervised learning. However, they tend to suffer from overfitting more than end-to-end methods, are more complex, and the gap with similar lidar-based detectors remains significant. In this work, we propose an end-to-end, single stage, monocular 3D object detector, DD3D, that can benefit from depth pre-training like pseudo-lidar methods, but without their limitations. Our architecture is designed for effective information transfer between depth estimation and 3D detection, allowing us to scale with the amount of unlabeled pre-training data. Our method achieves state-of-the-art results on two challenging benchmarks, with 16.34% and 9.28% AP for Cars and Pedestrians (respectively) on the KITTI-3D benchmark, and 41.5% mAP on NuScenes.

Dennis Park, Rares Ambrus, Vitor Guizilini, Jie Li, Adrien Gaidon• 2021

Related benchmarks

Task	Dataset	Result
3D Object Detection	nuScenes (test)	mAP41.8	903
3D Object Detection	NuScenes v1.0 (test)	mAP41.8	230
3D Object Detection	KITTI car (test)	AP3D (Easy)23.22	226
3D Object Detection	KITTI Pedestrian (test)	AP3D (Easy)1.39e+3	75
3D Object Detection	KITTI (test)	--	60
3D Object Detection	KITTI Cyclist (test)	AP3D Easy239	59
Bird's eye view object detection	KITTI (test)	APBEV@0.7 (Easy)32.35	53
3D Object Detection	KITTI (test)	AP3D (Easy)23.22	26
Monocular 3D Object Detection	KITTI car (test)	AP3D R40 (Easy, IoU=0.7)23.22	19
Monocular 3D Object Detection	KITTI 3D (test)	AP3D Easy23.22	19

Showing 10 of 14 rows

Other info

Follow for update

@wizwand_team Discord