Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Designing an Encoder for StyleGAN Image Manipulation

About

Recently, there has been a surge of diverse methods for performing image editing by employing pre-trained unconditional generators. Applying these methods on real images, however, remains a challenge, as it necessarily requires the inversion of the images into their latent space. To successfully invert a real image, one needs to find a latent code that reconstructs the input image accurately, and more importantly, allows for its meaningful manipulation. In this paper, we carefully study the latent space of StyleGAN, the state-of-the-art unconditional generator. We identify and analyze the existence of a distortion-editability tradeoff and a distortion-perception tradeoff within the StyleGAN latent space. We then suggest two principles for designing encoders in a manner that allows one to control the proximity of the inversions to regions that StyleGAN was originally trained on. We present an encoder based on our two principles that is specifically designed for facilitating editing on real images by balancing these tradeoffs. By evaluating its performance qualitatively and quantitatively on numerous challenging domains, including cars and horses, we show that our inversion method, followed by common editing techniques, achieves superior real-image editing quality, with only a small reconstruction accuracy drop.

Omer Tov, Yuval Alaluf, Yotam Nitzan, Or Patashnik, Daniel Cohen-Or• 2021

Related benchmarks

TaskDatasetResultRank
Image ReconstructionFFHQ No glasses
LPIPS0.191
18
Image ReconstructionFFHQ Glasses
LPIPS0.194
18
Facial Attribute EditingDISFA (test)
Distance0.278
16
Attribute ClassificationFFHQ (test)
Accuracy95.6
15
Image Editing (Add glasses)FFHQ (test)
ID-Sim0.577
15
Image Editing (Remove glasses)FFHQ (test)
ID-Sim0.592
15
Face image reconstructionCelebA-HQ (test)
MAE0.092
13
Real image projectionCelebA-HQ (test)
MSE0.05
9
3D GAN InversionMEAD (novel views)
LPIPS (±60°)0.318
7
3D GAN InversionFFHQ + LPFF (test)
L2 Loss0.064
7
Showing 10 of 15 rows

Other info

Follow for update