SatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite Imagery

About

Unsupervised pre-training methods for large vision models have shown to enhance performance on downstream supervised tasks. Developing similar techniques for satellite imagery presents significant opportunities as unlabelled data is plentiful and the inherent temporal and multi-spectral structure provides avenues to further improve existing pre-training strategies. In this paper, we present SatMAE, a pre-training framework for temporal or multi-spectral satellite imagery based on Masked Autoencoder (MAE). To leverage temporal information, we include a temporal embedding along with independently masking image patches across time. In addition, we demonstrate that encoding multi-spectral data as groups of bands with distinct spectral positional encodings is beneficial. Our approach yields strong improvements over previous state-of-the-art techniques, both in terms of supervised learning performance on benchmark datasets (up to $\uparrow$ 7%), and transfer learning performance on downstream remote sensing tasks, including land cover classification (up to $\uparrow$ 14%) and semantic segmentation. Code and data are available on the project website: https://sustainlab-group.github.io/SatMAE/

Yezhen Cong, Samar Khanna, Chenlin Meng, Patrick Liu, Erik Rozi, Yutong He, Marshall Burke, David B. Lobell, Stefano Ermon• 2022

Related benchmarks

Task	Dataset	Result
Image Classification	EuroSAT	Accuracy83.92	569
Change Detection	LEVIR-CD (test)	F1 Score87.65	485
Image Classification	RESISC45	Accuracy94.8	472
Change Detection	LEVIR-CD	F1 Score87.65	275
Semantic segmentation	Vaihingen	mIoU73.64	156
Semantic segmentation	iSAID	mIoU62.97	146
Semantic segmentation	Potsdam	mIoU73.55	110
Scene Classification	AID TR=50%	Accuracy96.94	94
Scene Classification	AID TR=20%	Accuracy95.02	93
Change Detection	LEVIR	F1 Score91.4	85

Showing 10 of 128 rows

...

Other info

Follow for update

@wizwand_team Discord