Hybrid Fusion: One-Minute Efficient Training for Zero-Shot Cross-Domain Image Fusion

About

Image fusion seeks to integrate complementary information from multiple sources into a single, superior image. While traditional methods are fast, they lack adaptability and performance. Conversely, deep learning approaches achieve state-of-the-art (SOTA) results but suffer from critical inefficiencies: their reliance on slow, resource-intensive, patch-based training introduces a significant gap with full-resolution inference. We propose a novel hybrid framework that resolves this trade-off. Our method utilizes a learnable U-Net to generate a dynamic guidance map that directs a classic, fixed Laplacian pyramid fusion kernel. This decoupling of policy learning from pixel synthesis enables remarkably efficient full-resolution training, eliminating the train-inference gap. Consequently, our model achieves SOTA-comparable performance in about one minute on a RTX 4090 or two minutes on a consumer laptop GPU from scratch without any external model and demonstrates powerful zero-shot generalization across diverse tasks, from infrared-visible to medical imaging. By design, the fused output is linearly constructed solely from source information, ensuring high faithfulness for critical applications. The codes are available at https://github.com/Zirconium233/HybridFusion

Ran Zhang, Xuanhua He, Liu Liu• 2026

Related benchmarks

Task	Dataset	Result
Infrared and Visible Image Fusion	RoadScene	Qabf0.649	42
Infrared-Visible Image Fusion	MSRS	QAB/F (Quality Assessment Block/Fusion)0.721	38
Object Detection	MSRS (test)	mAP@0.595.18	34
Multi-Modal Image Fusion	MRI-CT (test)	EN4.988	30
Medical image fusion	PET-MRI (test)	SSIM1.253	14
Medical image fusion	SPECT-MRI (test)	SSIM1.27	14
Image Fusion	MSRS (test)	VIF1.079	13
Object Detection	MSRS	--	11

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord