Efficient Scale-Invariant Generator with Column-Row Entangled Pixel Synthesis

About

Any-scale image synthesis offers an efficient and scalable solution to synthesize photo-realistic images at any scale, even going beyond 2K resolution. However, existing GAN-based solutions depend excessively on convolutions and a hierarchical architecture, which introduce inconsistency and the $``$texture sticking$"$ issue when scaling the output resolution. From another perspective, INR-based generators are scale-equivariant by design, but their huge memory footprint and slow inference hinder these networks from being adopted in large-scale or real-time systems. In this work, we propose $\textbf{C}$olumn-$\textbf{R}$ow $\textbf{E}$ntangled $\textbf{P}$ixel $\textbf{S}$ynthesis ($\textbf{CREPS}$), a new generative model that is both efficient and scale-equivariant without using any spatial convolutions or coarse-to-fine design. To save memory footprint and make the system scalable, we employ a novel bi-line representation that decomposes layer-wise feature maps into separate $``$thick$"$ column and row encodings. Experiments on various datasets, including FFHQ, LSUN-Church, MetFaces, and Flickr-Scenery, confirm CREPS' ability to synthesize scale-consistent and alias-free images at any arbitrary resolution with proper training and inference speed. Code is available at https://github.com/VinAIResearch/CREPS.

Thuan Hoang Nguyen, Thanh Van Le, Anh Tran• 2023

Related benchmarks

Task	Dataset	Result
Unconditional Image Generation	LSUN Church 256x256	FID5.5	14
Unconditional image synthesis	FFHQ 1024	FID4.09	12
Image Synthesis	FFHQ 1024 (test)	FID (50k)4.09	9
Image Synthesis	LSUN Church 256x256 (test)	FID5.5	6
Image Synthesis	FFHQ 512 (test)	FID4.43	3
Unconditional image synthesis	FFHQ 512	FID4.43	3
Unconditional image synthesis	Scenery-256	FID7.21	3
Unconditional image synthesis	MetFaces 1024	FID20.52	2

Showing 8 of 8 rows

Other info

Code

Follow for update

@wizwand_team Discord