Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation Models

About

Foundation models have achieved remarkable results in 2D and language tasks like image segmentation, object detection, and visual-language understanding. However, their potential to enrich 3D scene representation learning is largely untapped due to the existence of the domain gap. In this work, we propose an innovative methodology called Bridge3D to address this gap by pre-training 3D models using features, semantic masks, and captions sourced from foundation models. Specifically, our method employs semantic masks from foundation models to guide the masking and reconstruction process for the masked autoencoder, enabling more focused attention on foreground representations. Moreover, we bridge the 3D-text gap at the scene level using image captioning foundation models, thereby facilitating scene-level knowledge distillation. We further extend this bridging effort by introducing an innovative object-level knowledge distillation method that harnesses highly accurate object-level masks and semantic text data from foundation models. Our methodology significantly surpasses the performance of existing state-of-the-art methods in 3D object detection and semantic segmentation tasks. For instance, on the ScanNet dataset, Bridge3D improves the baseline by a notable margin of 6.3%. Code will be available at: https://github.com/Zhimin-C/Bridge3D

Zhimin Chen, Longlong Jing, Yingwei Li, Bing Li• 2023

Related benchmarks

Task	Dataset	Result
Semantic segmentation	S3DIS (Area 5)	mIOU70.2	1006
3D Object Detection	ScanNet V2 (val)	mAP@0.2569.1	361
3D Semantic Segmentation	ScanNet V2 (val)	mIoU73.9	209
3D Object Detection	SUN RGB-D	mAP@0.2567.9	107
Semantic segmentation	S3DIS	mIoU70.2	93
3D Object Detection	SUN RGB-D v1 (val)	mAP@0.2567.9	81
3D Object Detection	ScanNet V2	AP5051.9	66
Semantic segmentation	ScanNet	mIoU73.9	59
3D Semantic Segmentation	S3DIS	mIoU70.2	27
3D Semantic Segmentation	ScanNet V2	mIoU73.9	16

Showing 10 of 11 rows

Other info

Code

Follow for update

@wizwand_team Discord