Efficient Rectified Flow for Image Fusion

About

Image fusion is a fundamental and important task in computer vision, aiming to combine complementary information from different modalities to fuse images. In recent years, diffusion models have made significant developments in the field of image fusion. However, diffusion models often require complex computations and redundant inference time, which reduces the applicability of these methods. To address this issue, we propose RFfusion, an efficient one-step diffusion model for image fusion based on Rectified Flow. We incorporate Rectified Flow into the image fusion task to straighten the sampling path in the diffusion model, achieving one-step sampling without the need for additional training, while still maintaining high-quality fusion results. Furthermore, we propose a task-specific variational autoencoder (VAE) architecture tailored for image fusion, where the fusion operation is embedded within the latent space to further reduce computational complexity. To address the inherent discrepancy between conventional reconstruction-oriented VAE objectives and the requirements of image fusion, we introduce a two-stage training strategy. This approach facilitates the effective learning and integration of complementary information from multi-modal source images, thereby enabling the model to retain fine-grained structural details while significantly enhancing inference efficiency. Extensive experiments demonstrate that our method outperforms other state-of-the-art methods in terms of both inference speed and fusion quality. Code is available at https://github.com/zirui0625/RFfusion.

Zirui Wang, Jiayi Zhang, Tianwei Guan, Yuhan Zhou, Xingyuan Li, Minjing Dong, Jinyuan Liu• 2025

Related benchmarks

Task	Dataset	Result
Semantic segmentation	MSRS	mIoU56.2	120
Video Fusion	VTMOT	QG48.99	13
Infrared-Visible Image Fusion	MSRS	MUSIQ42.81	11
Object Detection	MSRS	mAP68.2	11
Infrared and Visible Video Fusion	HDO (test)	QG0.4478	10
Infrared and Visible Video Fusion	M3SVD (test)	QG0.4371	10
Medical image fusion	Harvard dataset Noise degradation	Q_MI60.85	9
Medical image fusion	Harvard dataset Composite degradation	Q_MI0.5931	9
Object Detection	M3FD	Precision95.42	9
Medical image fusion	Harvard dataset Blur degradation	Q_MI0.5831	9

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord