It's Never Too Late: Noise Optimization for Collapse Recovery in Trained Diffusion Models

About

Contemporary text-to-image models exhibit a surprising degree of mode collapse, as can be seen when sampling several images given the same text prompt. Previous work has attempted to address this issue by steering the model using guidance mechanisms, or by generating a large pool of candidates and refining them. In this work, we take a different direction and aim for diversity in generations via noise optimization. Specifically, we show that a simple noise optimization objective can mitigate mode collapse while preserving the fidelity of the base model. We also analyze the frequency characteristics of the noise and show that alternative noise initializations with different frequency profiles can improve both optimization and search. Our experiments demonstrate that noise optimization yields superior results in terms of generation quality and diversity.

Anne Harrington, A. Sophia Koepke, Shyamgopal Karthik, Trevor Darrell, Alexei A. Efros• 2025

Related benchmarks

Task	Dataset	Result
Text-to-Image Generation	COCO	FID9.03	104
Text-to-Image Generation	GenEval	DINO0.786	18
Text-to-Image Generation	GenEval	DrSim0.446	15
Text-to-Image Generation	T2I-CompBench 1.0 (test)	CLIP Score0.344	14
Text-to-Image Generation	PartiPrompts 1632 prompts x 4 images	InBSim0.74	12
Text-to-Image Generation	DrawBench 1.0 (test)	InBSim0.668	12
Text-to-Image Generation	PartiPrompts 1.0 (test)	InBSim0.74	12
Text-to-Image Generation	T2I-CompBench	DINO Score0.799	9
Text-to-Image Generation	GenEval	DreamSim Score0.477	6
Human Preference	GenEval single-object	Win Rate vs i.i.d.90	3

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord