Voxel Field Fusion for 3D Object Detection

About

In this work, we present a conceptually simple yet effective framework for cross-modality 3D object detection, named voxel field fusion. The proposed approach aims to maintain cross-modality consistency by representing and fusing augmented image features as a ray in the voxel field. To this end, the learnable sampler is first designed to sample vital features from the image plane that are projected to the voxel grid in a point-to-ray manner, which maintains the consistency in feature representation with spatial context. In addition, ray-wise fusion is conducted to fuse features with the supplemental context in the constructed voxel field. We further develop mixed augmentor to align feature-variant transformations, which bridges the modality gap in data augmentation. The proposed framework is demonstrated to achieve consistent gains in various benchmarks and outperforms previous fusion-based methods on KITTI and nuScenes datasets. Code is made available at https://github.com/dvlab-research/VFF.

Yanwei Li, Xiaojuan Qi, Yukang Chen, Liwei Wang, Zeming Li, Jian Sun, Jiaya Jia• 2022

Related benchmarks

Task	Dataset	Result
3D Object Detection	nuScenes (test)	mAP68.4	903
3D Instance Segmentation	ScanNet V2 (val)	Average AP5064.3	198
3D Instance Segmentation	S3DIS (Area 5)	mAP@50% IoU59.3	120
3D Object Detection	KITTI (test)	3D AP Easy89.58	61
3D Object Detection	KITTI (val)	--	24
3D Object Detection	KITTI	3D AP (Car, Moderate)85.51	15
3D Instance Segmentation	ScanNet (test)	mAP50.6	15

Showing 7 of 7 rows

Other info

Code

Follow for update

@wizwand_team Discord