Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Masked Scene Contrast: A Scalable Framework for Unsupervised 3D Representation Learning

About

As a pioneering work, PointContrast conducts unsupervised 3D representation learning via leveraging contrastive learning over raw RGB-D frames and proves its effectiveness on various downstream tasks. However, the trend of large-scale unsupervised learning in 3D has yet to emerge due to two stumbling blocks: the inefficiency of matching RGB-D frames as contrastive views and the annoying mode collapse phenomenon mentioned in previous works. Turning the two stumbling blocks into empirical stepping stones, we first propose an efficient and effective contrastive learning framework, which generates contrastive views directly on scene-level point clouds by a well-curated data augmentation pipeline and a practical view mixing strategy. Second, we introduce reconstructive learning on the contrastive learning framework with an exquisite design of contrastive cross masks, which targets the reconstruction of point color and surfel normal. Our Masked Scene Contrast (MSC) framework is capable of extracting comprehensive 3D representations more efficiently and effectively. It accelerates the pre-training procedure by at least 3x and still achieves an uncompromised performance compared with previous work. Besides, MSC also enables large-scale 3D pre-training across multiple datasets, which further boosts the performance and achieves state-of-the-art fine-tuning results on several downstream tasks, e.g., 75.5% mIoU on ScanNet semantic segmentation validation set.

Xiaoyang Wu, Xin Wen, Xihui Liu, Hengshuang Zhao• 2023

Related benchmarks

TaskDatasetResultRank
Semantic segmentationS3DIS (Area 5)
mIOU71.6
799
Semantic segmentationScanNet V2 (val)
mIoU78.2
288
Semantic segmentationScanNet v2 (test)
mIoU78.2
248
Semantic segmentationScanNet (val)
mIoU78.2
231
3D Instance SegmentationScanNet V2 (val)
Average AP5059.6
195
3D Visual GroundingScanRefer (val)--
155
3D Instance SegmentationS3DIS (Area 5)
mAP@50% IoU50.5
106
Semantic segmentationScanNet200 (val)
mIoU33.4
74
Semantic segmentationScanNet
mIoU58.2
59
Instance SegmentationScanNetV2 (val)
mAP@0.559.6
58
Showing 10 of 42 rows

Other info

Code

Follow for update