Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection

About

In this paper, we introduce the task of multi-view RGB-based 3D object detection as an end-to-end optimization problem. To address this problem, we propose ImVoxelNet, a novel fully convolutional method of 3D object detection based on monocular or multi-view RGB images. The number of monocular images in each multi-view input can variate during training and inference; actually, this number might be unique for each multi-view input. ImVoxelNet successfully handles both indoor and outdoor scenes, which makes it general-purpose. Specifically, it achieves state-of-the-art results in car detection on KITTI (monocular) and nuScenes (multi-view) benchmarks among all methods that accept RGB images. Moreover, it surpasses existing RGB-based 3D object detection methods on the SUN RGB-D dataset. On ScanNet, ImVoxelNet sets a new benchmark for multi-view 3D object detection. The source code and the trained models are available at https://github.com/saic-vul/imvoxelnet.

Danila Rukhovich, Anna Vorontsova, Anton Konushin• 2021

Related benchmarks

TaskDatasetResultRank
3D Object DetectionnuScenes (val)--
941
3D Object DetectionScanNet V2 (val)
mAP@0.2546.7
352
3D Object DetectionKITTI car (test)
AP3D (Easy)17.15
195
3D Object DetectionView-of-Delft (VoD) Entire Annotated Area (val)
mAP3D14.17
86
3D Object DetectionScanNet (val)--
66
3D Object DetectionSUN RGB-D (test)--
64
3D Object DetectionKITTI car (val)
AP 3D Easy17.85
62
3D Object DetectionKITTI (test)--
60
3D Object DetectionScanNet V2
AP5023.8
54
Bird's eye view object detectionKITTI (test)
APBEV@0.7 (Easy)25.19
53
Showing 10 of 48 rows

Other info

Code

Follow for update