SMARTIES: Spectrum-Aware Multi-Sensor Auto-Encoder for Remote Sensing Images
About
From optical sensors to microwave radars, leveraging the complementary strengths of remote sensing (RS) sensors is crucial for achieving dense spatio-temporal monitoring of our planet. In contrast, recent deep learning models, whether task-specific or foundational, are often specific to single sensors or to fixed combinations: adapting such models to different sensory inputs requires both architectural changes and re-training, limiting scalability and generalization across multiple RS sensors. On the contrary, a single model able to modulate its feature representations to accept diverse sensors as input would pave the way to agile and flexible multi-sensor RS data processing. To address this, we introduce SMARTIES, a generic and versatile foundation model lifting sensor-specific/dependent efforts and enabling scalability and generalization to diverse RS sensors: SMARTIES projects data from heterogeneous sensors into a shared spectrum-aware space, enabling the use of arbitrary combinations of bands both for training and inference. To obtain sensor-agnostic representations, we train a single, unified transformer model reconstructing masked multi-sensor data with cross-sensor token mixup. On both single- and multi-modal tasks across diverse sensors, SMARTIES outperforms previous models that rely on sensor-specific pretraining. Our code and pretrained models are available at https://gsumbul.github.io/SMARTIES.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Semantic segmentation | PANGAEA (val) | BurnSr82.8 | 18 | |
| Remote Sensing Image Classification | m-eurosat | Accuracy92.6 | 7 | |
| Remote Sensing Image Classification | m-bigearthnet | Accuracy62 | 7 | |
| Remote Sensing Image Classification | Sentinel-2 benchmark suite | Rank2.6 | 7 | |
| Semantic segmentation | 11 Remote Sensing Benchmark Datasets 1.0 (aggregated) | Average Rank2.6 | 7 | |
| Remote Sensing Image Classification | m-SA crop-type | Accuracy24.3 | 7 | |
| Remote Sensing Image Classification | m-cashew | Accuracy12.7 | 7 | |
| Semantic segmentation | SegMunich in-distribution (test) | mIoU39.1 | 6 | |
| Image Classification | m-forestnet (test) | Accuracy49.8 | 4 | |
| Image Classification | LoveDA Urban (test) | Accuracy13.5 | 4 |