Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Text-Guided Channel Perturbation and Pretrained Knowledge Integration for Unified Multi-Modality Image Fusion

About

Multi-modality image fusion enhances scene perception by combining complementary information. Unified models aim to share parameters across modalities for multi-modality image fusion, but large modality differences often cause gradient conflicts, limiting performance. Some methods introduce modality-specific encoders to enhance feature perception and improve fusion quality. However, this strategy reduces generalisation across different fusion tasks. To overcome this limitation, we propose a unified multi-modality image fusion framework based on channel perturbation and pre-trained knowledge integration (UP-Fusion). To suppress redundant modal information and emphasize key features, we propose the Semantic-Aware Channel Pruning Module (SCPM), which leverages the semantic perception capability of a pre-trained model to filter and enhance multi-modality feature channels. Furthermore, we proposed the Geometric Affine Modulation Module (GAM), which uses original modal features to apply affine transformations on initial fusion features to maintain the feature encoder modal discriminability. Finally, we apply a Text-Guided Channel Perturbation Module (TCPM) during decoding to reshape the channel distribution, reducing the dependence on modality-specific channels. Extensive experiments demonstrate that the proposed algorithm outperforms existing methods on both multi-modality image fusion and downstream tasks.

Xilai Li, Xiaosong Li, Weijun Jiang• 2025

Related benchmarks

TaskDatasetResultRank
Infrared-Visible Image FusionMSRS
QAB/F (Quality Assessment Block/Fusion)0.6859
38
Video FusionVTMOT
QG51.54
13
Infrared and Visible Video FusionHDO (test)
QG0.5997
10
Infrared and Visible Video FusionM3SVD (test)
QG0.6153
10
Image FusionEfficiency Evaluation
FLOPs (G)953.9
10
Infrared and Visible Video FusionHDO
QMI0.4603
8
Infrared and Visible Video FusionM3SVD
QMI57.74
8
Infrared and Visible Video FusionVTMOT
QMI0.502
8
Infrared and Visible Image FusionLLVIP
QG61.41
5
Showing 9 of 9 rows

Other info

Follow for update