Hybrid Fusion: One-Minute Efficient Training for Zero-Shot Cross-Domain Image Fusion
About
Image fusion seeks to integrate complementary information from multiple sources into a single, superior image. While traditional methods are fast, they lack adaptability and performance. Conversely, deep learning approaches achieve state-of-the-art (SOTA) results but suffer from critical inefficiencies: their reliance on slow, resource-intensive, patch-based training introduces a significant gap with full-resolution inference. We propose a novel hybrid framework that resolves this trade-off. Our method utilizes a learnable U-Net to generate a dynamic guidance map that directs a classic, fixed Laplacian pyramid fusion kernel. This decoupling of policy learning from pixel synthesis enables remarkably efficient full-resolution training, eliminating the train-inference gap. Consequently, our model achieves SOTA-comparable performance in about one minute on a RTX 4090 or two minutes on a consumer laptop GPU from scratch without any external model and demonstrates powerful zero-shot generalization across diverse tasks, from infrared-visible to medical imaging. By design, the fused output is linearly constructed solely from source information, ensuring high faithfulness for critical applications. The codes are available at https://github.com/Zirconium233/HybridFusion
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Object Detection | MSRS (test) | mAP@0.595.18 | 34 | |
| Multi-Modal Image Fusion | MRI-CT (test) | EN4.988 | 30 | |
| Infrared and Visible Image Fusion | RoadScene | MI4.41 | 28 | |
| Infrared-Visible Image Fusion | MSRS | Entropy (EN)6.766 | 23 | |
| Medical image fusion | PET-MRI (test) | SSIM1.253 | 14 | |
| Medical image fusion | SPECT-MRI (test) | SSIM1.27 | 14 | |
| Image Fusion | MSRS (test) | VIF1.079 | 13 | |
| Object Detection | MSRS | mAP@5095.18 | 10 |