Be Tangential to Manifold: Discovering Riemannian Metric for Diffusion Models
About
Diffusion models are powerful deep generative models, but unlike classical models, they lack an explicit low-dimensional latent space that parameterizes the data manifold. This absence makes it difficult to perform manifold-aware operations, such as geometrically faithful interpolation or conditional guidance that respects the learned manifold. We propose a training-free Riemannian metric on the noise space, derived from the Jacobian of the score function. The key insight is that the spectral structure of this Jacobian separates tangent and normal directions of the data manifold; our metric leverages this separation to encourage paths to stay tangential to the manifold rather than drift toward high-density regions. To validate that our metric faithfully captures the manifold geometry, we examine it from two complementary angles. First, geodesics under our metric yield perceptually more natural interpolations than existing methods on synthetic, image, and video frame datasets. Second, the tangent-normal decomposition induced by our metric prevents classifier-free guidance from deviating off the manifold, improving generation quality while preserving text-image alignment.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Text-to-Image Generation | MS-COCO | FID11.53 | 131 | |
| Video Frame Interpolation | DAVIS | -- | 33 | |
| Image interpolation | MorphBench (A) | PPL0.38 | 14 | |
| Image interpolation | MorphBench (M) | PPL0.974 | 14 | |
| Image interpolation | CelebA-HQ (CA) | PPL0.632 | 14 | |
| Image interpolation | AF | Perplexity (PPL)0.761 | 14 | |
| Video Frame Interpolation | Human | MSE2.018 | 14 | |
| Video Frame Interpolation | RE10K | MSE2.58 | 14 | |
| Path Interpolation | Synthetic C-shaped 2D distribution | Standard Deviation0.0701 | 4 |