Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation Models

About

Foundation models have achieved remarkable results in 2D and language tasks like image segmentation, object detection, and visual-language understanding. However, their potential to enrich 3D scene representation learning is largely untapped due to the existence of the domain gap. In this work, we propose an innovative methodology called Bridge3D to address this gap by pre-training 3D models using features, semantic masks, and captions sourced from foundation models. Specifically, our method employs semantic masks from foundation models to guide the masking and reconstruction process for the masked autoencoder, enabling more focused attention on foreground representations. Moreover, we bridge the 3D-text gap at the scene level using image captioning foundation models, thereby facilitating scene-level knowledge distillation. We further extend this bridging effort by introducing an innovative object-level knowledge distillation method that harnesses highly accurate object-level masks and semantic text data from foundation models. Our methodology significantly surpasses the performance of existing state-of-the-art methods in 3D object detection and semantic segmentation tasks. For instance, on the ScanNet dataset, Bridge3D improves the baseline by a notable margin of 6.3%. Code will be available at: https://github.com/Zhimin-C/Bridge3D

Zhimin Chen, Longlong Jing, Yingwei Li, Bing Li• 2023

Related benchmarks

TaskDatasetResultRank
Semantic segmentationS3DIS (Area 5)
mIOU70.2
799
3D Object DetectionScanNet V2 (val)
mAP@0.2569.1
352
3D Semantic SegmentationScanNet V2 (val)
mIoU73.9
171
3D Object DetectionSUN RGB-D
mAP@0.2567.9
104
Semantic segmentationS3DIS
mIoU70.2
88
3D Object DetectionSUN RGB-D v1 (val)
mAP@0.2567.9
81
Semantic segmentationScanNet
mIoU73.9
59
3D Object DetectionScanNet V2
AP5051.9
54
3D Semantic SegmentationS3DIS
mIoU70.2
20
3D Semantic SegmentationScanNet V2
mIoU73.9
8
Showing 10 of 11 rows

Other info

Code

Follow for update