Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond

About

Multi-modality image fusion, particularly infrared and visible, plays a crucial role in integrating diverse modalities to enhance scene understanding. Although early research prioritized visual quality, preserving fine details and adapting to downstream tasks remains challenging. Recent approaches attempt task-specific design but rarely achieve "The Best of Both Worlds" due to inconsistent optimization goals. To address these issues, we propose a novel method that leverages the semantic knowledge from the Segment Anything Model (SAM) to Grow the quality of fusion results and Enable downstream task adaptability, namely SAGE. Specifically, we design a Semantic Persistent Attention (SPA) Module that efficiently maintains source information via the persistent repository while extracting high-level semantic priors from SAM. More importantly, to eliminate the impractical dependence on SAM during inference, we introduce a bi-level optimization-driven distillation mechanism with triplet losses, which allow the student network to effectively extract knowledge. Extensive experiments show that our method achieves a balance between high-quality visual results and downstream task adaptability while maintaining practical deployment efficiency. The code is available at https://github.com/RollingPlain/SAGE_IVIF.

Guanyao Wu, Haoyu Liu, Hongming Fu, Yichuan Peng, Jinyuan Liu, Xin Fan, Risheng Liu• 2025

Related benchmarks

TaskDatasetResultRank
Semantic segmentationMSRS
mIoU73
68
Infrared-Visible Image FusionRoadScene (test)
Visual Information Fidelity (VIF)0.53
53
Salient Object DetectionVT5000--
50
Semantic segmentationFMB
mIoU0.6078
49
Visible-Infrared Image FusionMSRS (test)--
43
Infrared-Visible Image FusionMSRS
QAB/F (Quality Assessment Block/Fusion)0.6242
38
Infrared-Visible Image FusionLLVIP (test)
EN6.96
36
Object DetectionM3FD
AP@[0.5:0.95]62.25
35
Infrared-Visible Image FusionKAIST
AG3.376
22
Infrared-Visible Image FusionFLIR
AG3.254
22
Showing 10 of 36 rows

Other info

Follow for update