Cross-Modality Knowledge Distillation Network for Monocular 3D Object Detection

About

Leveraging LiDAR-based detectors or real LiDAR point data to guide monocular 3D detection has brought significant improvement, e.g., Pseudo-LiDAR methods. However, the existing methods usually apply non-end-to-end training strategies and insufficiently leverage the LiDAR information, where the rich potential of the LiDAR data has not been well exploited. In this paper, we propose the Cross-Modality Knowledge Distillation (CMKD) network for monocular 3D detection to efficiently and directly transfer the knowledge from LiDAR modality to image modality on both features and responses. Moreover, we further extend CMKD as a semi-supervised training framework by distilling knowledge from large-scale unlabeled data and significantly boost the performance. Until submission, CMKD ranks $1^{st}$ among the monocular 3D detectors with publications on both KITTI $test$ set and Waymo $val$ set with significant performance gains compared to previous state-of-the-art methods.

Yu Hong, Hang Dai, Yong Ding• 2022

Related benchmarks

Task	Dataset	Result
3D Object Detection	KITTI car (test)	AP3D (Easy)28.55	226
3D Object Detection	Waymo Open Dataset (val)	--	219
3D Object Detection	KITTI Pedestrian (test)	AP3D (Easy)17.79	75
3D Object Detection	KITTI Cyclist (test)	AP3D Easy9.6	65
3D Object Detection	Waymo Open Dataset LEVEL_2 (val)	3D AP (Overall)12.99	60
3D Object Detection	Waymo Open Dataset LEVEL_1 (val)	3D AP14.69	60
3D Object Detection	KITTI (test)	AP3D (Easy)28.55	26
Monocular 3D Object Detection	KITTI car (test)	AP3D R40 (Easy, IoU=0.7)25.09	19
3D Object Detection	KITTI Cyclist official (test)	3D AP (Easy)12.52	8

Showing 9 of 9 rows

Other info

Code

Follow for update

@wizwand_team Discord