Image Generation with a Sphere Encoder
About
We introduce the Sphere Encoder, an efficient generative framework capable of producing images in a single forward pass and competing with many-step diffusion models using fewer than five steps. Our approach works by learning an encoder that maps natural images uniformly onto a spherical latent space, and a decoder that maps random latent vectors back to the image space. Trained solely through image reconstruction losses, the model generates an image by simply decoding a random point on the sphere. Our architecture naturally supports conditional generation, and looping the encoder/decoder a few times can further enhance image quality. Across several datasets, the sphere encoder approach yields performance competitive with state of the art diffusions, but with a small fraction of the inference cost. Project page is available at https://sphere-encoder.github.io .
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Generation | ImageNet 256x256 | -- | 243 | |
| Unconditional Image Generation | CIFAR-10 | -- | 171 | |
| Conditional Image Generation | CIFAR-10 | -- | 71 | |
| Conditional Image Generation | Oxford Flowers | gFID10.63 | 8 | |
| Unconditional Image Generation | Animal Faces | gFID17.97 | 7 |