Determinism of Randomness: Prompt-Residual Seed Shaping for Diffusion Generation
About
Diffusion models start generation from an isotropic Gaussian latent, yet changing only the random seed can lead to large differences in prompt faithfulness, composition, and visual quality. We study this seed sensitivity through the semantic map from initial noise to generated meaning. Although the sampling flow is locally invertible, the subsequent semantic projection is many-to-one, inducing a degenerate pullback semi-metric on the latent space: most local directions are nearly semantic-invariant, while semantic-sensitive variation is concentrated in a much smaller horizontal subspace. This provides an explanatory geometric view of the seed lottery. Motivated by this view, we introduce a training-free prompt-residual seed-shaping procedure. Rather than claiming to recover the exact horizontal space, the method uses a single high-noise cold-start prompt residual as a model-coupled proxy, injects only its tangential component, and retracts the seed to the original Gaussian radius shell. This keeps the initialization prior-compatible while adding only one conditional/unconditional probe before standard sampling. Across multiple generation benchmarks, the method improves alignment and quality metrics over standard sampling, supporting both the practical value of the proxy and the explanatory relevance of semantic anisotropy.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Text-to-Image Generation | Pick-a-Pic | PickScore17.5612 | 150 | |
| Video Generation | VBench | -- | 126 | |
| Text-to-Image Generation | HPD | PickScore16.8347 | 38 | |
| Image-to-3D | Toys4k | FD (Inception)29.4028 | 11 | |
| Text-to-Image Generation | DrawBench | PickScore17.597 | 7 | |
| Text-to-3D Generation | User Study Text to 3D | Detailed Objects Score58.6 | 2 | |
| Text-to-Image Generation | User Study T2I | Basic Objects & Colors92.5 | 2 | |
| Text-to-Video Generation | User Study T2V | Dynamic Scenes Score0.664 | 2 |