SenseFlow: Scaling Distribution Matching for Flow-based Text-to-Image Distillation

About

The Distribution Matching Distillation (DMD) has been successfully applied to text-to-image diffusion models such as Stable Diffusion (SD) 1.5. However, vanilla DMD suffers from convergence difficulties on large-scale flow-based text-to-image models, such as SD 3.5 and FLUX. In this paper, we first analyze the issues when applying vanilla DMD on large-scale models. Then, to overcome the scalability challenge, we propose implicit distribution alignment (IDA) to constrain the divergence between the generator and the fake distribution. Furthermore, we propose intra-segment guidance (ISG) to relocate the timestep denoising importance from the teacher model. With IDA alone, DMD converges for SD 3.5; employing both IDA and ISG, DMD converges for SD 3.5 and FLUX.1 dev. Together with a scaled VFM-based discriminator, our final model, dubbed \textbf{SenseFlow}, achieves superior performance in distillation for both diffusion based text-to-image models such as SDXL, and flow-matching models such as SD 3.5 Large and FLUX.1 dev. The source code is available at \href{https://github.com/XingtongGe/SenseFlow}{https://github.com/XingtongGe/SenseFlow}

Xingtong Ge, Xin Zhang, Tongda Xu, Yi Zhang, Xinjie Zhang, Yan Wang, Jun Zhang• 2025

Related benchmarks

Task	Dataset	Result
Text-to-Image Generation	GenEval	GenEval Score70.98	442
Text-to-Image Generation	DPG-Bench	DPG Score79.86	156
Image Generation	ImageNet-256 (FID-50K)	FID4.2	36
Text-to-Image Generation	OneIG-Bench	Alignment0.776	33
Text-to-Image Generation	COCO 5k	CLIP Score0.3287	19
Text-to-Image Generation	MS-COCO 10K prompts 2014 (val)	FID34.1	19
Text-to-Image Generation	HPS prompt set v2	CLIP Score0.283	11
Text-to-Image Generation	Align5000 1.0 (test)	CLIP Score0.311	9
Text-to-Image Generation	Text-to-Image Evaluation Prompts	CLIP Score0.3332	8

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord