Is Contrastive Distillation Enough for Learning Comprehensive 3D Representations?

About

Cross-modal contrastive distillation has recently been explored for learning effective 3D representations. However, existing methods focus primarily on modality-shared features, neglecting the modality-specific features during the pre-training process, which leads to suboptimal representations. In this paper, we theoretically analyze the limitations of current contrastive methods for 3D representation learning and propose a new framework, namely CMCR (Cross-Modal Comprehensive Representation Learning), to address these shortcomings. Our approach improves upon traditional methods by better integrating both modality-shared and modality-specific features. Specifically, we introduce masked image modeling and occupancy estimation tasks to guide the network in learning more comprehensive modality-specific features. Furthermore, we propose a novel multi-modal unified codebook that learns an embedding space shared across different modalities. Besides, we introduce geometry-enhanced masked image modeling to further boost 3D representation learning. Extensive experiments demonstrate that our method mitigates the challenges faced by traditional approaches and consistently outperforms existing image-to-LiDAR contrastive distillation methods in downstream tasks. Code will be available at https://github.com/Eaphan/CMCR.

Yifan Zhang, Junhui Hou• 2024

Related benchmarks

Task	Dataset	Result
3D Object Detection	nuScenes (val)	NDS62	217
Semantic segmentation	nuScenes 1.0 (val)	mIoU76.34	81
Semantic segmentation	semanticKITTI SynLiDAR source (val)	mIoU (Mean IoU)53.58	33
Semantic segmentation	SemanticKITTI v1.0 (val)	mIoU49.86	30
LiDAR Semantic Segmentation	SemanticSTF (val)	mIoU60.71	16
Panoptic Segmentation	nuScenes 1% labels (val)	PQ20.7	16
Semantic segmentation	ScribbleKITTI (val)	mIoU55.36	12
Semantic segmentation	RELLIS-3D (val)	mIoU56.4	12
Semantic segmentation	SemanticPOSS (val)	mIoU58.63	12
Semantic segmentation	DAPS-3D (val)	mIoU87.29	12

Showing 10 of 15 rows

Other info

Follow for update

@wizwand_team Discord