Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Determinism of Randomness: Prompt-Residual Seed Shaping for Diffusion Generation

About

Diffusion models start generation from an isotropic Gaussian latent, yet changing only the random seed can lead to large differences in prompt faithfulness, composition, and visual quality. We study this seed sensitivity through the semantic map from initial noise to generated meaning. Although the sampling flow is locally invertible, the subsequent semantic projection is many-to-one, inducing a degenerate pullback semi-metric on the latent space: most local directions are nearly semantic-invariant, while semantic-sensitive variation is concentrated in a much smaller horizontal subspace. This provides an explanatory geometric view of the seed lottery. Motivated by this view, we introduce a training-free prompt-residual seed-shaping procedure. Rather than claiming to recover the exact horizontal space, the method uses a single high-noise cold-start prompt residual as a model-coupled proxy, injects only its tangential component, and retracts the seed to the original Gaussian radius shell. This keeps the initialization prior-compatible while adding only one conditional/unconditional probe before standard sampling. Across multiple generation benchmarks, the method improves alignment and quality metrics over standard sampling, supporting both the practical value of the proxy and the explanatory relevance of semantic anisotropy.

Song Yan, Wei Zhai, Chenfeng Wang, Xinliang Bi, Jian Yang, Yancheng Cai, Yusen Zhang, Yunwei Lan, Tao Zhang, GuanYe Xiong, Min Li, Zheng-Jun Zha• 2025

Related benchmarks

TaskDatasetResultRank
Text-to-Image GenerationPick-a-Pic
PickScore17.5612
150
Video GenerationVBench--
126
Text-to-Image GenerationHPD
PickScore16.8347
38
Image-to-3DToys4k
FD (Inception)29.4028
11
Text-to-Image GenerationDrawBench
PickScore17.597
7
Text-to-3D GenerationUser Study Text to 3D
Detailed Objects Score58.6
2
Text-to-Image GenerationUser Study T2I
Basic Objects & Colors92.5
2
Text-to-Video GenerationUser Study T2V
Dynamic Scenes Score0.664
2
Showing 8 of 8 rows

Other info

Follow for update