The Determinism of Randomness: Latent Space Degeneracy in Diffusion Model
About
Diffusion models draw the initial latent from an isotropic Gaussian distribution (all directions equally likely). But in practice, changing only the random seed can sharply alter image quality and prompt faithfulness. We explain this by distinguishing the isotropic prior from the semantics induced by the sampling map: while the prior is direction-agnostic, the mapping from latent noise to semantics has semantic-invariant directions and semantic-sensitive directions, so different seeds can lead to very different semantic outcomes. Motivated by this view, we propose a training-free inference procedure that (i) suppresses seed-specific, semantic-irrelevant variation via distribution-preserving semantic erasure, (ii) reinforces prompt-relevant semantic directions through timestep-aggregated horizontal injection, and (iii) applies a simple spherical retraction to stay near the prior's typical set. Across multiple backbones and benchmarks, our method consistently improves alignment and generation quality over standard sampling.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Video Generation | VBench | -- | 102 | |
| Text-to-Image Generation | Pick-a-Pic | PickScore17.5612 | 47 | |
| Image-to-3D | Toys4k | FD (Inception)29.4028 | 8 | |
| Text-to-Image Generation | DrawBench | PickScore17.597 | 7 | |
| Text-to-Image Generation | HPD | PickScore16.8347 | 7 | |
| Text-to-3D Generation | User Study Text to 3D | Detailed Objects Score58.6 | 2 | |
| Text-to-Image Generation | User Study T2I | Basic Objects & Colors92.5 | 2 | |
| Text-to-Video Generation | User Study T2V | Dynamic Scenes Score0.664 | 2 |