Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CFG-Zero*: Improved Classifier-Free Guidance for Flow Matching Models

About

Classifier-Free Guidance (CFG) is a widely adopted technique in diffusion/flow models to improve image fidelity and controllability. In this work, we first analytically study the effect of CFG on flow matching models trained on Gaussian mixtures where the ground-truth flow can be derived. We observe that in the early stages of training, when the flow estimation is inaccurate, CFG directs samples toward incorrect trajectories. Building on this observation, we propose CFG-Zero*, an improved CFG with two contributions: (a) optimized scale, where a scalar is optimized to correct for the inaccuracies in the estimated velocity, hence the * in the name; and (b) zero-init, which involves zeroing out the first few steps of the ODE solver. Experiments on both text-to-image (Lumina-Next, Stable Diffusion 3, and Flux) and text-to-video (Wan-2.1) generation demonstrate that CFG-Zero* consistently outperforms CFG, highlighting its effectiveness in guiding Flow Matching models. (Code is available at github.com/WeichenFan/CFG-Zero-star)

Weichen Fan, Amber Yijia Zheng, Raymond A. Yeh, Ziwei Liu• 2025

Related benchmarks

TaskDatasetResultRank
Class-conditional Image GenerationImageNet 256x256
Inception Score (IS)258.9
815
Text-to-Image GenerationGenEval
Overall Score59.24
506
Text-to-Video GenerationVBench
Quality Score84.51
155
Text-to-Image GenerationPick-a-Pic
ImageReward1.07
107
Text-to-Image GenerationDrawBench
Pick Score23.18
40
Text-to-Image GenerationLAION 5B 1K
HPSv2.128.272
18
Text-to-Image GenerationMS COCO 1K
HPSv2.128.296
18
Text to ImageMS-COCO 5k image-text pairs
FID20.317
15
Video GenerationVideoJAM-bench
Motion Score98.01
10
Video EditingVACE-Benchmark (test)
SC Score93.8
8
Showing 10 of 10 rows

Other info

Follow for update