Pseudo-Stereo for Monocular 3D Object Detection in Autonomous Driving
About
Pseudo-LiDAR 3D detectors have made remarkable progress in monocular 3D detection by enhancing the capability of perceiving depth with depth estimation networks, and using LiDAR-based 3D detection architectures. The advanced stereo 3D detectors can also accurately localize 3D objects. The gap in image-to-image generation for stereo views is much smaller than that in image-to-LiDAR generation. Motivated by this, we propose a Pseudo-Stereo 3D detection framework with three novel virtual view generation methods, including image-level generation, feature-level generation, and feature-clone, for detecting 3D objects from a single image. Our analysis of depth-aware learning shows that the depth loss is effective in only feature-level virtual view generation and the estimated depth map is effective in both image-level and feature-level in our framework. We propose a disparity-wise dynamic convolution with dynamic kernels sampled from the disparity feature map to filter the features adaptively from a single image for generating virtual image features, which eases the feature degradation caused by the depth estimation errors. Till submission (November 18, 2021), our Pseudo-Stereo 3D detection framework ranks 1st on car, pedestrian, and cyclist among the monocular 3D detectors with publications on the KITTI-3D benchmark. The code is released at https://github.com/revisitq/Pseudo-Stereo-3D.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Object Detection | KITTI (val) | AP3D (Moderate)24.15 | 85 | |
| 3D Object Detection | KITTI Pedestrian (test) | AP3D (Easy)1.70e+3 | 63 | |
| 3D Object Detection | KITTI (test) | -- | 60 | |
| Bird's eye view object detection | KITTI (test) | APBEV@0.7 (Easy)32.64 | 53 | |
| 3D Object Detection | KITTI official (test) | 3D AP (Easy)23.74 | 43 | |
| BEV Object Detection | KITTI official (test) | AP40 Easy32.84 | 22 | |
| 3D Object Detection | KITTI official (val) | AP40 Easy35.18 | 21 | |
| Monocular 3D Object Detection (Car) | KITTI official (test) | AP3D (Easy)23.74 | 17 | |
| Pedestrian Detection | KITTI (test) | AP BEV (Easy)12.8 | 9 |