Structured State-Space Regularization for Generation-Friendly Image Tokenization
About
Image tokenizers play a central role in modern generative models, where the structure of the latent space critically determines the downstream generation performance. A key but underexplored property of effective latent representations is spectral organization, the ability to encode information across frequency components. In this work, we introduce structured state-space regularization, a principled approach to inducing spectral structure in latent spaces. We derive a regularization objective by revisiting state-space models (SSMs) as systems mimicking a basis function's behavior. This perspective reveals that hidden states of SSMs are induced to capture the frequency components, resulting in a novel regularizer that enforces the latent space to capture spectral structure of images. Experiments demonstrate that our regularizer improves the generative performance of image tokenizers while incurring only minimal loss in their reconstruction fidelity.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Reconstruction | ImageNet-1K 1.0 (val) | rFID0.91 | 35 | |
| Image Generation | ImageNet-1K 1.0 (val) | FID7.29 | 17 |