HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model
About
Accurate hyperspectral image (HSI) interpretation is critical for providing valuable insights into various earth observation-related applications such as urban planning, precision agriculture, and environmental monitoring. However, existing HSI processing methods are predominantly task-specific and scene-dependent, which severely limits their ability to transfer knowledge across tasks and scenes, thereby reducing the practicality in real-world applications. To address these challenges, we present HyperSIGMA, a vision transformer-based foundation model that unifies HSI interpretation across tasks and scenes, scalable to over one billion parameters. To overcome the spectral and spatial redundancy inherent in HSIs, we introduce a novel sparse sampling attention (SSA) mechanism, which effectively promotes the learning of diverse contextual features and serves as the basic block of HyperSIGMA. HyperSIGMA integrates spatial and spectral features using a specially designed spectral enhancement module. In addition, we construct a large-scale hyperspectral dataset, HyperGlobal-450K, for pre-training, which contains about 450K hyperspectral images, significantly surpassing existing datasets in scale. Extensive experiments on various high-level and low-level HSI tasks demonstrate HyperSIGMA's versatility and superior representational capability compared to current state-of-the-art methods. Moreover, HyperSIGMA shows significant advantages in scalability, robustness, cross-modal transferring capability, real-world applicability, and computational efficiency. The code and models will be released at https://github.com/WHU-Sigma/HyperSIGMA.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Hyperspectral Image Classification | Pavia University (test) | -- | 103 | |
| Hyperspectral Classification | WHU-Hi Hanchuan (test) | Average Accuracy64.3 | 31 | |
| Cloud Optical Thickness (COT) Regression | HyperFM250k | MSE0.3212 | 14 | |
| Cloud Water Path (CWP) Regression | HyperFM250k | MSE1.3317 | 14 | |
| Cloud Effective Radius (CER) Regression | HyperFM250k | MSE95.4874 | 14 | |
| Cloud Top Height (CTH) Regression | HyperFM 250k | MSE8.4936 | 14 | |
| Scene Classification | HRSSC (test) | OA81.85 | 11 | |
| Semantic segmentation | OxHyperMinerals | mIoU65.6 | 11 | |
| Semantic segmentation | EnMAP CDL | mIoU58.4 | 11 | |
| Semantic segmentation | EnMAP BD-Foret | mIoU52.4 | 11 |