FusionMamba: Efficient Remote Sensing Image Fusion with State Space Model

About

Remote sensing image fusion aims to generate a high-resolution multi/hyper-spectral image by combining a high-resolution image with limited spectral data and a low-resolution image rich in spectral information. Current deep learning (DL) methods typically employ convolutional neural networks (CNNs) or Transformers for feature extraction and information integration. While CNNs are efficient, their limited receptive fields restrict their ability to capture global context. Transformers excel at learning global information but are computationally expensive. Recent advancements in the state space model (SSM), particularly Mamba, present a promising alternative by enabling global perception with low complexity. However, the potential of SSM for information integration remains largely unexplored. Therefore, we propose FusionMamba, an innovative method for efficient remote sensing image fusion. Our contributions are twofold. First, to effectively merge spatial and spectral features, we expand the single-input Mamba block to accommodate dual inputs, creating the FusionMamba block, which serves as a plug-and-play solution for information integration. Second, we incorporate Mamba and FusionMamba blocks into an interpretable network architecture tailored for remote sensing image fusion. Our designs utilize two U-shaped network branches, each primarily composed of four-directional Mamba blocks, to extract spatial and spectral features separately and hierarchically. The resulting feature maps are sufficiently merged in an auxiliary network branch constructed with FusionMamba blocks. Furthermore, we improve the representation of spectral information through an enhanced channel attention module. Quantitative and qualitative valuation results across six datasets demonstrate that our method achieves SOTA performance. The code is available at https://github.com/PSRben/FusionMamba.

Siran Peng, Xiangyu Zhu, Haoyu Deng, Liang-Jian Deng, Zhen Lei• 2024

Related benchmarks

Task	Dataset	Result
Pansharpening	WorldView-3 full-resolution original (test)	D_lambda0.0183	95
Infrared-Visible Image Fusion	RoadScene (test)	Visual Information Fidelity (VIF)0.635	53
Visible-Infrared Image Fusion	MSRS (test)	Average Gradient (AG)3.599	43
Pansharpening	WorldView-3 Full Resolution	Dλ (Spectral Divergence)0.019	28
Pansharpening	WorldView-2 (WV2) Real Data Full Resolution (test)	D_lambda0.0526	25
Pansharpening	WorldView-3 Reduced-resolution (test)	SAM2.844	17
Pansharpening	QuickBird (QB) reduced-resolution (test)	SAM4.61	17
Semantic segmentation	MSRS (test)	Background Score98.4	17
Pansharpening	GaoFen-2 real-world data (test)	HQNR0.9536	14
Infrared-Visible Image Fusion	M3FD (test)	MI4.044	14

Showing 10 of 18 rows

Other info

Follow for update

@wizwand_team Discord