Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Self-Supervised Pretraining of 3D Features on any Point-Cloud

About

Pretraining on large labeled datasets is a prerequisite to achieve good performance in many computer vision tasks like 2D object recognition, video classification etc. However, pretraining is not widely used for 3D recognition tasks where state-of-the-art methods train models from scratch. A primary reason is the lack of large annotated datasets because 3D data is both difficult to acquire and time consuming to label. We present a simple self-supervised pertaining method that can work with any 3D data - single or multiview, indoor or outdoor, acquired by varied sensors, without 3D registration. We pretrain standard point cloud and voxel based model architectures, and show that joint pretraining further improves performance. We evaluate our models on 9 benchmarks for object detection, semantic segmentation, and object classification, where they achieve state-of-the-art results and can outperform supervised pretraining. We set a new state-of-the-art for object detection on ScanNet (69.0% mAP) and SUNRGBD (63.5% mAP). Our pretrained models are label efficient and improve performance for classes with few examples.

Zaiwei Zhang, Rohit Girdhar, Armand Joulin, Ishan Misra• 2021

Related benchmarks

TaskDatasetResultRank
Semantic segmentationS3DIS (Area 5)
mIOU70.6
907
3D Object DetectionScanNet V2 (val)
mAP@0.2564
361
Semantic segmentationScanNet V2 (val)
mIoU71.2
316
Semantic segmentationScanNet (val)
mIoU71.2
274
Semantic segmentationnuScenes (val)
mIoU (Segmentation)0.317
265
3D Semantic SegmentationScanNet V2 (val)
mIoU73.1
209
Object ClassificationModelNet40 (test)--
180
Semantic segmentationSemanticKITTI (val)
mIoU41.5
174
3D Object DetectionSUN RGB-D (val)
mAP@0.2560.4
163
3D Object DetectionScanNet
mAP@0.2562.1
127
Showing 10 of 31 rows

Other info

Follow for update