Cross-Scale MAE: A Tale of Multi-Scale Exploitation in Remote Sensing

About

Remote sensing images present unique challenges to image analysis due to the extensive geographic coverage, hardware limitations, and misaligned multi-scale images. This paper revisits the classical multi-scale representation learning problem but under the general framework of self-supervised learning for remote sensing image understanding. We present Cross-Scale MAE, a self-supervised model built upon the Masked Auto-Encoder (MAE).During pre-training, Cross-Scale MAE employs scale augmentation techniques and enforces cross-scale consistency constraints through both contrastive and generative losses to ensure consistent and meaningful representations well-suited for a wide range of downstream tasks. Further, our implementation leverages the xFormers library to accelerate network pre-training on a single GPU while maintaining the quality of learned representations. Experimental evaluations demonstrate that Cross-Scale MAE exhibits superior performance compared to standard MAE and other state-of-the-art remote sensing MAE methods.

Maofeng Tang, Andrei Cozma, Konstantinos Georgiou, Hairong Qi• 2024

Related benchmarks

Task	Dataset	Result
Image Classification	EuroSAT	Accuracy84.01	569
Semantic segmentation	Potsdam (test)	mIoU76.17	193
Semantic segmentation	Vaihingen	mIoU76.03	156
Semantic segmentation	Potsdam	mIoU76.17	110
Image Classification	WHU-RS19	Accuracy79.8	70
Image Classification	fMoW (val)	Accuracy71.4	34
Image Classification	UC Merced	Accuracy (KNN)93.1	31
Image Classification	RESISC-45 (val)	Top-1 Accuracy91.1	22
Image Classification	FireRisk (val)	Accuracy61.6	20
Image Classification	ForestNet (val)	Accuracy49.7	20

Showing 10 of 22 rows

Other info

Follow for update

@wizwand_team Discord