Disp R-CNN: Stereo 3D Object Detection via Shape Prior Guided Instance Disparity Estimation
About
In this paper, we propose a novel system named Disp R-CNN for 3D object detection from stereo images. Many recent works solve this problem by first recovering a point cloud with disparity estimation and then apply a 3D detector. The disparity map is computed for the entire image, which is costly and fails to leverage category-specific prior. In contrast, we design an instance disparity estimation network (iDispNet) that predicts disparity only for pixels on objects of interest and learns a category-specific shape prior for more accurate disparity estimation. To address the challenge from scarcity of disparity annotation in training, we propose to use a statistical shape model to generate dense disparity pseudo-ground-truth without the need of LiDAR point clouds, which makes our system more widely applicable. Experiments on the KITTI dataset show that, even when LiDAR ground-truth is not available at training time, Disp R-CNN achieves competitive performance and outperforms previous state-of-the-art methods by 20% in terms of average precision.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Object Detection | KITTI car (test) | AP3D (Easy)59.6 | 195 | |
| 3D Object Detection | KITTI car (val) | -- | 62 | |
| Bird's Eye View Object Detection (Car) | KITTI (test) | APBEV (Easy) @IoU=0.779.76 | 59 | |
| 3D Object Detection (Car) | KITTI (test) | AP3D (Easy) @ IoU=0.768.21 | 36 | |
| Bird's Eye View Detection | KITTI (val) | APBEV (IoU=0.7, Easy)77.63 | 36 | |
| 3D Object Detection (Cyclists) | KITTI (test) | AP (Easy)40.05 | 27 | |
| 3D Object Detection | KITTI official (test) | APBEV (Easy)79.76 | 19 | |
| Joint vehicle detection and pose estimation | KITTI car (test) | AOS (Easy)93.02 | 15 | |
| 2D Car Detection | KITTI (test) | AP2D Easy93.45 | 14 | |
| 3D Object Detection | KITTI (test) | Pedestrian AP3D Easy37.12 | 9 |