Can Generative Geospatial Diffusion Models Excel as Discriminative Geospatial Foundation Models?
About
Self-supervised learning (SSL) has revolutionized representation learning in Remote Sensing (RS), advancing Geospatial Foundation Models (GFMs) to leverage vast unlabeled satellite imagery for diverse downstream tasks. Currently, GFMs primarily employ objectives like contrastive learning or masked image modeling, owing to their proven success in learning transferable representations. However, generative diffusion models, which demonstrate the potential to capture multi-grained semantics essential for RS tasks during image generation, remain underexplored for discriminative applications. This prompts the question: can generative diffusion models also excel and serve as GFMs with sufficient discriminative power? In this work, we answer this question with SatDiFuser, a framework that transforms a diffusion-based generative geospatial foundation model into a powerful pretraining tool for discriminative RS. By systematically analyzing multi-stage, noise-dependent diffusion features, we develop three fusion strategies to effectively leverage these diverse representations. Extensive experiments on remote sensing benchmarks show that SatDiFuser outperforms state-of-the-art GFMs, achieving gains of up to +5.7% mIoU in semantic segmentation and +7.9% F1-score in classification, demonstrating the capacity of diffusion-based generative foundation models to rival or exceed discriminative GFMs. The source code is available at: https://github.com/yurujaja/SatDiFuser.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Change Detection | LEVIR | F1 Score90.2 | 62 | |
| Change Detection | OSCD | -- | 26 | |
| Semantic segmentation | SpaceNet v1 | macro mIoU77.17 | 20 | |
| Semantic segmentation | Geo-Bench | mIoU (nz-cattle, macro)82.98 | 10 | |
| Semantic segmentation | Optical and Multispectral Segmentation Summary | mIoU (Optical, Macro)81.76 | 10 | |
| Multi-Label Classification | GB-BEN | F1 Score43.75 | 10 | |
| Semantic segmentation | PASTIS | Macro mIoU17.65 | 10 | |
| Multispectral Classification | GEO-Bench m-bigearthnet, m-so2sat, m-eurosat (test) | F1 Score (GB-ben)0.4997 | 10 | |
| Semantic segmentation | GEO-Bench SA-c | Macro mIoU20.87 | 10 | |
| Semantic segmentation | Sen1Floods11 | mIoU (macro)84.08 | 10 |