Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

C$^2$FG: Control Classifier-Free Guidance via Score Discrepancy Analysis

About

Classifier-Free Guidance (CFG) is a cornerstone of modern conditional diffusion models, yet its reliance on the fixed or heuristic dynamic guidance weight is predominantly empirical and overlooks the inherent dynamics of the diffusion process. In this paper, we provide a rigorous theoretical analysis of the Classifier-Free Guidance. Specifically, we establish strict upper bounds on the score discrepancy between conditional and unconditional distributions at different timesteps based on the diffusion process. This finding explains the limitations of fixed-weight strategies and establishes a principled foundation for time-dependent guidance. Motivated by this insight, we introduce \textbf{Control Classifier-Free Guidance (C$^2$FG)}, a novel, training-free, and plug-in method that aligns the guidance strength with the diffusion dynamics via an exponential decay control function. Extensive experiments demonstrate that C$^2$FG is effective and broadly applicable across diverse generative tasks, while also exhibiting orthogonality to existing strategies.

Jiayang Gao, Tianyi Zheng, Jiayang Zou, Fengxiang Yang, Shice Liu, Luyao Fan, Zheyu Zhang, Hao Zhang, Jinwei Chen, Peng-Tao Jiang, Bo Li, Jia Wang• 2026

Related benchmarks

TaskDatasetResultRank
Class-conditional Image GenerationImageNet 256x256 (val)
FID1.41
427
Class-conditional Image GenerationImageNet 64x64
FID1.03
156
Text-to-Image GenerationMS-COCO
FID5.28
131
Class-conditional Image GenerationImageNet 512x512 (val)
FID (Val)6.54
97
Text-to-Image GenerationSD 3-medium (2B) (evaluation)
CLIP Score0.315
11
Text-to-Image GenerationMS-COCO SD1.5
FID (10k)16.71
4
Conditional Image GenerationImageNet SiT
FID (10k Samples)3.2
2
Conditional Image GenerationImageNet 512x512 10k samples
FID5.15
2
Text-to-Image GenerationFlux T2I
CLIP Score31.5
2
Showing 9 of 9 rows

Other info

Follow for update