Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Decoupling Common and Unique Representations for Multimodal Self-supervised Learning

About

The increasing availability of multi-sensor data sparks wide interest in multimodal self-supervised learning. However, most existing approaches learn only common representations across modalities while ignoring intra-modal training and modality-unique representations. We propose Decoupling Common and Unique Representations (DeCUR), a simple yet effective method for multimodal self-supervised learning. By distinguishing inter- and intra-modal embeddings through multimodal redundancy reduction, DeCUR can integrate complementary information across different modalities. We evaluate DeCUR in three common multimodal scenarios (radar-optical, RGB-elevation, and RGB-depth), and demonstrate its consistent improvement regardless of architectures and for both multimodal and modality-missing settings. With thorough experiments and comprehensive analysis, we hope this work can provide valuable insights and raise more interest in researching the hidden relationships of multimodal representations.

Yi Wang, Conrad M Albrecht, Nassim Ait Ali Braham, Chenying Liu, Zhitong Xiong, Xiao Xiang Zhu• 2023

Related benchmarks

TaskDatasetResultRank
Segmentationm-SA crop-type
Mean mIoU34.49
27
Segmentationm-chesapeake
Mean mIoU69.83
23
Classificationm-so2sat GEO-Bench
Overall Accuracy61.7
22
Classificationm-eurosat GEO-Bench
Overall Accuracy97.9
20
Classificationm-brick-kiln GEO-Bench
Overall Accuracy (OA)98.7
20
Field Boundary SegmentationFTW (test)
Pixel IoU49
19
Classificationm-so2sat (test)
Mean Accuracy56.68
17
Flood Inundation MappingSen1Flood11
mIoU86.87
15
Multi-Label Classificationm-bigearthnet GeoBench
F1 Score70.9
14
Segmentationm-cashew GeoBench
mIoU84.15
14
Showing 10 of 29 rows

Other info

Follow for update