SSL4EO-S12: A Large-Scale Multi-Modal, Multi-Temporal Dataset for Self-Supervised Learning in Earth Observation
About
Self-supervised pre-training bears potential to generate expressive representations without human annotation. Most pre-training in Earth observation (EO) are based on ImageNet or medium-size, labeled remote sensing (RS) datasets. We share an unlabeled RS dataset SSL4EO-S12 (Self-Supervised Learning for Earth Observation - Sentinel-1/2) to assemble a large-scale, global, multimodal, and multi-seasonal corpus of satellite imagery from the ESA Sentinel-1 \& -2 satellite missions. For EO applications we demonstrate SSL4EO-S12 to succeed in self-supervised pre-training for a set of methods: MoCo-v2, DINO, MAE, and data2vec. Resulting models yield downstream performance close to, or surpassing accuracy measures of supervised learning. In addition, pre-training on SSL4EO-S12 excels compared to existing datasets. We make openly available the dataset, related source code, and pre-trained models at https://github.com/zhu-xlab/SSL4EO-S12.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Change Detection | LEVIR-CD | F1 Score89.05 | 188 | |
| Semantic segmentation | iSAID | mIoU64.01 | 68 | |
| Object Detection | DIOR | mAP5064.82 | 50 | |
| Classification | AID (test) | Top-1 Accuracy94.74 | 41 | |
| Scene Classification | RESISC-45 (test) | OA91.27 | 26 | |
| Change Detection | OSCD | F1 Score35.08 | 26 | |
| Object Detection | DIOR-R | mAP61.23 | 21 | |
| Object Detection | FAIR1M v2.0 | -- | 20 | |
| Scene Classification | BEN-S2 (test) | mAP91.8 | 19 | |
| Semantic segmentation | Dyna.-Pla. (val) | mIoU35.3 | 17 |