CROMA: Remote Sensing Representations with Contrastive Radar-Optical Masked Autoencoders

About

A vital and rapidly growing application, remote sensing offers vast yet sparsely labeled, spatially aligned multimodal data; this makes self-supervised learning algorithms invaluable. We present CROMA: a framework that combines contrastive and reconstruction self-supervised objectives to learn rich unimodal and multimodal representations. Our method separately encodes masked-out multispectral optical and synthetic aperture radar samples -- aligned in space and time -- and performs cross-modal contrastive learning. Another encoder fuses these sensors, producing joint multimodal encodings that are used to predict the masked patches via a lightweight decoder. We show that these objectives are complementary when leveraged on spatially aligned multimodal data. We also introduce X- and 2D-ALiBi, which spatially biases our cross- and self-attention matrices. These strategies improve representations and allow our models to effectively extrapolate to images up to 17.6x larger at test-time. CROMA outperforms the current SoTA multispectral model, evaluated on: four classification benchmarks -- finetuning (avg. 1.8%), linear (avg. 2.4%) and nonlinear (avg. 1.4%) probing, kNN classification (avg. 3.5%), and K-means clustering (avg. 8.4%); and three segmentation benchmarks (avg. 6.4%). CROMA's rich, optionally multimodal representations can be widely leveraged across remote sensing applications.

Anthony Fuller, Koreen Millard, James R. Green• 2023

Related benchmarks

Task	Dataset	Result
Image Classification	ImageNet-1K	Top-1 Acc80	1239
Image Classification	ImageNet-1k (val)	Top-1 Accuracy80	871
Image Classification	ImageNet 1k (test)	Top-1 Accuracy80	490
Action Recognition	UCF101	Accuracy41.6	433
Object Detection	COCO	mAP33.9	137
Semantic segmentation	Potsdam	mIoU66.752	110
3D Object Classification	ModelNet40	Top-1 Accuracy93.2	89
Change Detection	LEVIR	F1 Score88.5	85
Classification	TreeSatAI-TS	F1 Score72.09	75
Semantic segmentation	ScanNet	mIoU70.6	59

Showing 10 of 116 rows

...

Other info

Follow for update

@wizwand_team Discord