Image Generation with a Sphere Encoder

About

We introduce the Sphere Encoder, an efficient generative framework capable of producing images in a single forward pass and competing with many-step diffusion models using fewer than five steps. Our approach works by learning an encoder that maps natural images uniformly onto a spherical latent space, and a decoder that maps random latent vectors back to the image space. Trained solely through image reconstruction losses, the model generates an image by simply decoding a random point on the sphere. Our architecture naturally supports conditional generation, and looping the encoder/decoder a few times can further enhance image quality. Across several datasets, the sphere encoder approach yields performance competitive with state of the art diffusions, but with a small fraction of the inference cost. Project page is available at https://sphere-encoder.github.io .

Kaiyu Yue, Menglin Jia, Ji Hou, Tom Goldstein• 2026

Related benchmarks

Task	Dataset	Result
Image Generation	ImageNet 256x256	IS301.8	517
Unconditional Image Generation	CIFAR-10	--	280
Conditional Image Generation	CIFAR-10	--	88
Class-conditional Image Generation	ImageNet-1k (val)	FID4.02	79
Class-conditional Image Generation	ImageNet 1k (train)	FID4.02	31
Conditional Image Generation	Oxford Flowers	gFID10.63	8
Unconditional Image Generation	Animal Faces	gFID17.97	7
Image Generation	Animal Faces	gFID17.97	6

Showing 8 of 8 rows

Other info

GitHub

Follow for update

@wizwand_team Discord