Self-Supervised Pretraining of 3D Features on any Point-Cloud

About

Pretraining on large labeled datasets is a prerequisite to achieve good performance in many computer vision tasks like 2D object recognition, video classification etc. However, pretraining is not widely used for 3D recognition tasks where state-of-the-art methods train models from scratch. A primary reason is the lack of large annotated datasets because 3D data is both difficult to acquire and time consuming to label. We present a simple self-supervised pertaining method that can work with any 3D data - single or multiview, indoor or outdoor, acquired by varied sensors, without 3D registration. We pretrain standard point cloud and voxel based model architectures, and show that joint pretraining further improves performance. We evaluate our models on 9 benchmarks for object detection, semantic segmentation, and object classification, where they achieve state-of-the-art results and can outperform supervised pretraining. We set a new state-of-the-art for object detection on ScanNet (69.0% mAP) and SUNRGBD (63.5% mAP). Our pretrained models are label efficient and improve performance for classes with few examples.

Zaiwei Zhang, Rohit Girdhar, Armand Joulin, Ishan Misra• 2021

Related benchmarks

Task	Dataset	Result
Semantic segmentation	S3DIS (Area 5)	mIOU70.6	1006
Semantic segmentation	ScanNet V2 (val)	mIoU71.2	380
3D Object Detection	ScanNet V2 (val)	mAP@0.2564	361
Semantic segmentation	nuScenes (val)	mIoU (Segmentation)0.317	323
Semantic segmentation	ScanNet (val)	mIoU71.2	302
Semantic segmentation	SemanticKITTI (val)	mIoU41.5	212
3D Semantic Segmentation	ScanNet V2 (val)	mIoU73.1	209
Object Classification	ModelNet40 (test)	--	180
3D Object Detection	SUN RGB-D (val)	mAP@0.2560.4	163
3D Object Detection	ScanNet	mAP@0.2562.1	127

Showing 10 of 31 rows

Other info

Follow for update

@wizwand_team Discord