Text-IF: Leveraging Semantic Text Guidance for Degradation-Aware and Interactive Image Fusion

About

Image fusion aims to combine information from different source images to create a comprehensively representative image. Existing fusion methods are typically helpless in dealing with degradations in low-quality source images and non-interactive to multiple subjective and objective needs. To solve them, we introduce a novel approach that leverages semantic text guidance image fusion model for degradation-aware and interactive image fusion task, termed as Text-IF. It innovatively extends the classical image fusion to the text guided image fusion along with the ability to harmoniously address the degradation and interaction issues during fusion. Through the text semantic encoder and semantic interaction fusion decoder, Text-IF is accessible to the all-in-one infrared and visible image degradation-aware processing and the interactive flexible fusion outcomes. In this way, Text-IF achieves not only multi-modal image fusion, but also multi-modal information fusion. Extensive experiments prove that our proposed text guided image fusion strategy has obvious advantages over SOTA methods in the image fusion performance and degradation treatment. The code is available at https://github.com/XunpengYi/Text-IF.

Xunpeng Yi, Han Xu, Hao Zhang, Linfeng Tang, Jiayi Ma• 2024

Related benchmarks

Task	Dataset	Result
Semantic segmentation	MFNet (test)	mIoU60.65	172
Semantic segmentation	MSRS	mIoU68.25	120
Semantic segmentation	FMB (test)	mIoU68.4	110
Object Detection	LLVIP	mAP5094.1	109
Semantic segmentation	FMB	mIoU0.5936	67
Visible-Infrared Image Fusion	MSRS (test)	Average Gradient (AG)3.84	55
Infrared-Visible Image Fusion	RoadScene (test)	--	53
Salient Object Detection	VT5000	--	50
Infrared-Visible Image Fusion	LLVIP (test)	EN7.24	48
Object Detection	M3FD	AP@[0.5:0.95]62.19	45

Showing 10 of 45 rows

Other info

Follow for update

@wizwand_team Discord