CFG-Zero*: Improved Classifier-Free Guidance for Flow Matching Models

About

Classifier-Free Guidance (CFG) is a widely adopted technique in diffusion/flow models to improve image fidelity and controllability. In this work, we first analytically study the effect of CFG on flow matching models trained on Gaussian mixtures where the ground-truth flow can be derived. We observe that in the early stages of training, when the flow estimation is inaccurate, CFG directs samples toward incorrect trajectories. Building on this observation, we propose CFG-Zero*, an improved CFG with two contributions: (a) optimized scale, where a scalar is optimized to correct for the inaccuracies in the estimated velocity, hence the * in the name; and (b) zero-init, which involves zeroing out the first few steps of the ODE solver. Experiments on both text-to-image (Lumina-Next, Stable Diffusion 3, and Flux) and text-to-video (Wan-2.1) generation demonstrate that CFG-Zero* consistently outperforms CFG, highlighting its effectiveness in guiding Flow Matching models. (Code is available at github.com/WeichenFan/CFG-Zero-star)

Weichen Fan, Amber Yijia Zheng, Raymond A. Yeh, Ziwei Liu• 2025

Related benchmarks

Task	Dataset	Result
Class-conditional Image Generation	ImageNet 256x256	Inception Score (IS)258.9	967
Text-to-Image Generation	GenEval	Overall Score59.24	517
Text-to-Video Generation	VBench	Quality Score84.51	168
Text-to-Image Generation	Pick-a-Pic	PickScore22.36	150
Text-to-Image Generation	DrawBench	Pick Score23.18	40
Text-to-Image Generation	LAION 5B 1K	HPSv2.128.272	18
Text-to-Image Generation	MS COCO 1K	HPSv2.128.296	18
Shape completion	MedShapeNet (test)	Dice Score54.44	16
Text to Image	MS-COCO 5k image-text pairs	FID20.317	15
Text-to-Image Generation	DrawBench	HPS29.04	14

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord