HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model

About

Accurate hyperspectral image (HSI) interpretation is critical for providing valuable insights into various earth observation-related applications such as urban planning, precision agriculture, and environmental monitoring. However, existing HSI processing methods are predominantly task-specific and scene-dependent, which severely limits their ability to transfer knowledge across tasks and scenes, thereby reducing the practicality in real-world applications. To address these challenges, we present HyperSIGMA, a vision transformer-based foundation model that unifies HSI interpretation across tasks and scenes, scalable to over one billion parameters. To overcome the spectral and spatial redundancy inherent in HSIs, we introduce a novel sparse sampling attention (SSA) mechanism, which effectively promotes the learning of diverse contextual features and serves as the basic block of HyperSIGMA. HyperSIGMA integrates spatial and spectral features using a specially designed spectral enhancement module. In addition, we construct a large-scale hyperspectral dataset, HyperGlobal-450K, for pre-training, which contains about 450K hyperspectral images, significantly surpassing existing datasets in scale. Extensive experiments on various high-level and low-level HSI tasks demonstrate HyperSIGMA's versatility and superior representational capability compared to current state-of-the-art methods. Moreover, HyperSIGMA shows significant advantages in scalability, robustness, cross-modal transferring capability, real-world applicability, and computational efficiency. The code and models will be released at https://github.com/WHU-Sigma/HyperSIGMA.

Di Wang, Meiqi Hu, Yao Jin, Yuchun Miao, Jiaqi Yang, Yichu Xu, Xiaolei Qin, Jiaqi Ma, Lingyu Sun, Chenxing Li, Chuan Fu, Hongruixuan Chen, Chengxi Han, Naoto Yokoya, Jing Zhang, Minqiang Xu, Lin Liu, Lefei Zhang, Chen Wu, Bo Du, Dacheng Tao, Liangpei Zhang• 2024

Related benchmarks

Task	Dataset	Result
Hyperspectral Image Classification	Pavia University (test)	--	103
Hyperspectral Classification	WHU-Hi Hanchuan (test)	Average Accuracy64.3	31
Semantic segmentation	MTS12 N→S	mIoU29.72	17
Semantic segmentation	WHUOHS w/o domain gap	mIoU57.7	17
Cloud Optical Thickness (COT) Regression	HyperFM250k	MSE0.3212	14
Cloud Water Path (CWP) Regression	HyperFM250k	MSE1.3317	14
Cloud Effective Radius (CER) Regression	HyperFM250k	MSE95.4874	14
Cloud Top Height (CTH) Regression	HyperFM 250k	MSE8.4936	14
Semantic segmentation	MTS12 S→N	mIoU26.4	13
Scene Classification	HRSSC (test)	OA81.85	11

Showing 10 of 37 rows

Other info

Follow for update

@wizwand_team Discord