Unsupervised Misaligned Infrared and Visible Image Fusion via Cross-Modality Image Generation and Registration
About
Recent learning-based image fusion methods have marked numerous progress in pre-registered multi-modality data, but suffered serious ghosts dealing with misaligned multi-modality data, due to the spatial deformation and the difficulty narrowing cross-modality discrepancy. To overcome the obstacles, in this paper, we present a robust cross-modality generation-registration paradigm for unsupervised misaligned infrared and visible image fusion (IVIF). Specifically, we propose a Cross-modality Perceptual Style Transfer Network (CPSTN) to generate a pseudo infrared image taking a visible image as input. Benefiting from the favorable geometry preservation ability of the CPSTN, the generated pseudo infrared image embraces a sharp structure, which is more conducive to transforming cross-modality image alignment into mono-modality registration coupled with the structure-sensitive of the infrared image. In this case, we introduce a Multi-level Refinement Registration Network (MRRN) to predict the displacement vector field between distorted and pseudo infrared images and reconstruct registered infrared image under the mono-modality setting. Moreover, to better fuse the registered infrared images and visible images, we present a feature Interaction Fusion Module (IFM) to adaptively select more meaningful features for fusion in the Dual-path Interaction Fusion Network (DIFN). Extensive experimental results suggest that the proposed method performs superior capability on misaligned cross-modality image fusion.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Semantic segmentation | MFNet (test) | mIoU49.1 | 134 | |
| Object Detection | M3FD dataset | -- | 48 | |
| Visible-Infrared Image Fusion | MSRS (test) | Average Gradient (AG)2.16 | 43 | |
| Semantic segmentation | MSRS | mIoU63.48 | 42 | |
| Infrared-Visible Image Fusion | RoadScene (test) | Average Gradient (AG)4.18 | 40 | |
| Object Detection | LLVIP (test) | mAP5095 | 38 | |
| Object Detection | MSRS (test) | mAP@0.595.3 | 34 | |
| Multi-Modal Image Fusion | MRI-CT (test) | EN5.56 | 30 | |
| Infrared and Visible Image Fusion | TNO image fusion | -- | 30 | |
| Homography Estimation | RGB-NIR | MACE22.38 | 19 |