PointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation
About
We present PointFusion, a generic 3D object detection method that leverages both image and 3D point cloud information. Unlike existing methods that either use multi-stage pipelines or hold sensor and dataset-specific assumptions, PointFusion is conceptually simple and application-agnostic. The image data and the raw point cloud data are independently processed by a CNN and a PointNet architecture, respectively. The resulting outputs are then combined by a novel fusion network, which predicts multiple 3D box hypotheses and their confidences, using the input 3D points as spatial anchors. We evaluate PointFusion on two distinctive datasets: the KITTI dataset that features driving scenes captured with a lidar-camera setup, and the SUN-RGBD dataset that captures indoor environments with RGB-D cameras. Our model is the first one that is able to perform better or on-par with the state-of-the-art on these diverse datasets without any dataset-specific model tuning.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Object Detection | SUN RGB-D (val) | mAP@0.2545.4 | 158 | |
| 6D Pose Estimation | YCB-Video | -- | 148 | |
| 3D Object Detection | KITTI (val) | -- | 85 | |
| 3D Object Detection | SUN RGB-D v1 (val) | -- | 81 | |
| 6DoF Pose Estimation | YCB-Video (test) | 2D Error < 2cm Rate74.1 | 72 | |
| 3D Object Detection | SUN RGB-D (test) | mAP@0.2545.4 | 64 | |
| 6D Pose Estimation | LineMod (test) | -- | 29 | |
| Object affordance anticipation | PIAD (Seen) | AUC77.5 | 13 | |
| 3D Affordance Learning | PIAD (Unseen) | aIoU5.3 | 9 | |
| 3D Object Detection | SUN-RGBD (test) | AP (bathtub)37.26 | 7 |