CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion
About
Multi-modality (MM) image fusion aims to render fused images that maintain the merits of different modalities, e.g., functional highlight and detailed textures. To tackle the challenge in modeling cross-modality features and decomposing desirable modality-specific and modality-shared features, we propose a novel Correlation-Driven feature Decomposition Fusion (CDDFuse) network. Firstly, CDDFuse uses Restormer blocks to extract cross-modality shallow features. We then introduce a dual-branch Transformer-CNN feature extractor with Lite Transformer (LT) blocks leveraging long-range attention to handle low-frequency global features and Invertible Neural Networks (INN) blocks focusing on extracting high-frequency local information. A correlation-driven loss is further proposed to make the low-frequency features correlated while the high-frequency features uncorrelated based on the embedded information. Then, the LT-based global fusion and INN-based local fusion layers output the fused image. Extensive experiments demonstrate that our CDDFuse achieves promising results in multiple fusion tasks, including infrared-visible image fusion and medical image fusion. We also show that CDDFuse can boost the performance in downstream infrared-visible semantic segmentation and object detection in a unified benchmark. The code is available at https://github.com/Zhaozixiang1228/MMIF-CDDFuse.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Object Detection | LLVIP | mAP5095.7 | 104 | |
| Semantic segmentation | FMB (test) | mIoU55.39 | 100 | |
| Object Detection | DroneVehicle (test) | -- | 67 | |
| Object Detection | FLIR | -- | 59 | |
| Infrared-Visible Image Fusion | RoadScene (test) | Visual Information Fidelity (VIF)0.61 | 53 | |
| Object Detection | LLVIP (test) | mAP5095.5 | 51 | |
| Semantic segmentation | FMB | mIoU0.6165 | 49 | |
| Object Detection | M3FD dataset | mAP@0.581.2 | 48 | |
| Visible-Infrared Image Fusion | MSRS (test) | Average Gradient (AG)4.043 | 43 | |
| Infrared and Visible Image Fusion | RoadScene | Qabf0.48 | 42 |