Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Pivotal Tuning for Latent-based Editing of Real Images

About

Recently, a surge of advanced facial editing techniques have been proposed that leverage the generative power of a pre-trained StyleGAN. To successfully edit an image this way, one must first project (or invert) the image into the pre-trained generator's domain. As it turns out, however, StyleGAN's latent space induces an inherent tradeoff between distortion and editability, i.e. between maintaining the original appearance and convincingly altering some of its attributes. Practically, this means it is still challenging to apply ID-preserving facial latent-space editing to faces which are out of the generator's domain. In this paper, we present an approach to bridge this gap. Our technique slightly alters the generator, so that an out-of-domain image is faithfully mapped into an in-domain latent code. The key idea is pivotal tuning - a brief training process that preserves the editing quality of an in-domain latent region, while changing its portrayed identity and appearance. In Pivotal Tuning Inversion (PTI), an initial inverted latent code serves as a pivot, around which the generator is fined-tuned. At the same time, a regularization term keeps nearby identities intact, to locally contain the effect. This surgical training process ends up altering appearance features that represent mostly identity, without affecting editing capabilities. We validate our technique through inversion and editing metrics, and show preferable scores to state-of-the-art methods. We further qualitatively demonstrate our technique by applying advanced edits (such as pose, age, or expression) to numerous images of well-known and recognizable identities. Finally, we demonstrate resilience to harder cases, including heavy make-up, elaborate hairstyles and/or headwear, which otherwise could not have been successfully inverted and edited by state-of-the-art methods.

Daniel Roich, Ron Mokady, Amit H. Bermano, Daniel Cohen-Or• 2021

Related benchmarks

TaskDatasetResultRank
3D ReconstructionAnimeRecon 1.0 (test)
Front CLIP Score89.9
9
Face ReconstructionCelebA-HQ
MSE0.0084
8
ReconstructionOOD videos
LPIPS0.3144
8
ReconstructionOOD videos Images
LPIPS0.3192
8
Identity PreservationFace Images OOD
Accuracy (eyeglasses)91.14
8
Identity PreservationOOD Face Videos
Eyeglasses Consistency90.49
8
Novel View SynthesisCelebA-HQ
ID Similarity65.7
7
3D GAN InversionFFHQ + LPFF (test)
L2 Loss0.019
7
3D GAN InversionCelebAHQ (test)
L2 Error0.033
7
3D GAN InversionMEAD (novel views)
LPIPS (±60°)0.346
7
Showing 10 of 13 rows

Other info

Follow for update