Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Spectral Collapse in Diffusion Inversion

About

Conditional diffusion inversion provides a powerful framework for unpaired image-to-image translation. However, we demonstrate through an extensive analysis that standard deterministic inversion (e.g. DDIM) fails when the source domain is spectrally sparse compared to the target domain (e.g., super-resolution, sketch-to-image). In these contexts, the recovered latent from the input does not follow the expected isotropic Gaussian distribution. Instead it exhibits a signal with lower frequencies, locking target sampling to oversmoothed and texture-poor generations. We term this phenomenon spectral collapse. We observe that stochastic alternatives attempting to restore the noise variance tend to break the semantic link to the input, leading to structural drift. To resolve this structure-texture trade-off, we propose Orthogonal Variance Guidance (OVG), an inference-time method that corrects the ODE dynamics to enforce the theoretical Gaussian noise magnitude within the null-space of the structural gradient. Extensive experiments on microscopy super-resolution (BBBC021) and sketch-to-image (Edges2Shoes) demonstrate that OVG effectively restores photorealistic textures while preserving structural fidelity.

Nicolas Bourriez, Alexandre Verine, Auguste Genovesio• 2026

Related benchmarks

TaskDatasetResultRank
Diffusion Inversionedges2shoes
PSNR9.38
9
Diffusion InversionBBBC021 x16
PSNR16.9
9
Showing 2 of 2 rows

Other info

Follow for update