Frustum ConvNet: Sliding Frustums to Aggregate Local Point-Wise Features for Amodal 3D Object Detection
About
In this work, we propose a novel method termed \emph{Frustum ConvNet (F-ConvNet)} for amodal 3D object detection from point clouds. Given 2D region proposals in an RGB image, our method first generates a sequence of frustums for each region proposal, and uses the obtained frustums to group local points. F-ConvNet aggregates point-wise features as frustum-level feature vectors, and arrays these feature vectors as a feature map for use of its subsequent component of fully convolutional network (FCN), which spatially fuses frustum-level features and supports an end-to-end and continuous estimation of oriented boxes in the 3D space. We also propose component variants of F-ConvNet, including an FCN variant that extracts multi-resolution frustum features, and a refined use of F-ConvNet over a reduced 3D space. Careful ablation studies verify the efficacy of these component variants. F-ConvNet assumes no prior knowledge of the working 3D environment and is thus dataset-agnostic. We present experiments on both the indoor SUN-RGBD and outdoor KITTI datasets. F-ConvNet outperforms all existing methods on SUN-RGBD, and at the time of submission it outperforms all published works on the KITTI benchmark. Code has been made available at: {\url{https://github.com/zhixinwang/frustum-convnet}.}
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Object Detection | KITTI (val) | AP3D (Moderate)78.8 | 85 | |
| 3D Object Detection | KITTI (test) | AP_3D Car (Easy)87.36 | 60 | |
| 3D Object Detection | KITTI (test) | AP Car (IoU=0.7) Easy87.36 | 38 | |
| 3D Object Detection | KITTI (test) | AP (Easy)87.36 | 27 | |
| Bird's Eye View Object Detection (Pedestrian) | KITTI (test) | AP (Easy)57.04 | 27 | |
| 3D Object Detection | KITTI Pedestrian official (test) | AP (Easy)52.16 | 19 | |
| BEV Object Detection | KITTI (val) | AP_BEV Easy90.23 | 14 | |
| 3D Object Localization (BEV) | KITTI (test) | AP (Cars, Easy)89.69 | 9 | |
| 3D Object Detection | SUN-RGBD (test) | AP (bathtub)61.32 | 7 |