Text-Guided Channel Perturbation and Pretrained Knowledge Integration for Unified Multi-Modality Image Fusion

About

Multi-modality image fusion enhances scene perception by combining complementary information. Unified models aim to share parameters across modalities for multi-modality image fusion, but large modality differences often cause gradient conflicts, limiting performance. Some methods introduce modality-specific encoders to enhance feature perception and improve fusion quality. However, this strategy reduces generalisation across different fusion tasks. To overcome this limitation, we propose a unified multi-modality image fusion framework based on channel perturbation and pre-trained knowledge integration (UP-Fusion). To suppress redundant modal information and emphasize key features, we propose the Semantic-Aware Channel Pruning Module (SCPM), which leverages the semantic perception capability of a pre-trained model to filter and enhance multi-modality feature channels. Furthermore, we proposed the Geometric Affine Modulation Module (GAM), which uses original modal features to apply affine transformations on initial fusion features to maintain the feature encoder modal discriminability. Finally, we apply a Text-Guided Channel Perturbation Module (TCPM) during decoding to reshape the channel distribution, reducing the dependence on modality-specific channels. Extensive experiments demonstrate that the proposed algorithm outperforms existing methods on both multi-modality image fusion and downstream tasks.

Xilai Li, Xiaosong Li, Weijun Jiang• 2025

Related benchmarks

Task	Dataset	Result
Infrared-Visible Image Fusion	MSRS	QAB/F (Quality Assessment Block/Fusion)0.6859	38
Video Fusion	VTMOT	QG51.54	13
Infrared and Visible Video Fusion	HDO (test)	QG0.5997	10
Infrared and Visible Video Fusion	M3SVD (test)	QG0.6153	10
Image Fusion	Efficiency Evaluation	FLOPs (G)953.9	10
Infrared and Visible Video Fusion	HDO	QMI0.4603	8
Infrared and Visible Video Fusion	M3SVD	QMI57.74	8
Infrared and Visible Video Fusion	VTMOT	QMI0.502	8
Infrared and Visible Image Fusion	LLVIP	QG61.41	5

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord