PointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation

About

We present PointFusion, a generic 3D object detection method that leverages both image and 3D point cloud information. Unlike existing methods that either use multi-stage pipelines or hold sensor and dataset-specific assumptions, PointFusion is conceptually simple and application-agnostic. The image data and the raw point cloud data are independently processed by a CNN and a PointNet architecture, respectively. The resulting outputs are then combined by a novel fusion network, which predicts multiple 3D box hypotheses and their confidences, using the input 3D points as spatial anchors. We evaluate PointFusion on two distinctive datasets: the KITTI dataset that features driving scenes captured with a lidar-camera setup, and the SUN-RGBD dataset that captures indoor environments with RGB-D cameras. Our model is the first one that is able to perform better or on-par with the state-of-the-art on these diverse datasets without any dataset-specific model tuning.

Danfei Xu, Dragomir Anguelov, Ashesh Jain• 2017

Related benchmarks

Task	Dataset	Result
3D Object Detection	SUN RGB-D (val)	mAP@0.2545.4	163
6D Pose Estimation	YCB-Video	--	151
3D Object Detection	KITTI (val)	--	85
3D Object Detection	SUN RGB-D v1 (val)	--	81
6DoF Pose Estimation	YCB-Video (test)	2D Error < 2cm Rate74.1	72
3D Object Detection	SUN RGB-D (test)	mAP@0.2545.4	64
6D Pose Estimation	LineMod (test)	--	29
Object affordance anticipation	PIAD (Seen)	AUC77.5	13
3D Affordance Grounding	PIAD (Seen)	aIOU12.31	12
3D Affordance Grounding	PIAD (Unseen)	aIOU5.33	12

Showing 10 of 14 rows

Other info

Follow for update

@wizwand_team Discord