IDE-3D: Interactive Disentangled Editing for High-Resolution 3D-aware Portrait Synthesis
About
Existing 3D-aware facial generation methods face a dilemma in quality versus editability: they either generate editable results in low resolution or high-quality ones with no editing flexibility. In this work, we propose a new approach that brings the best of both worlds together. Our system consists of three major components: (1) a 3D-semantics-aware generative model that produces view-consistent, disentangled face images and semantic masks; (2) a hybrid GAN inversion approach that initialize the latent codes from the semantic and texture encoder, and further optimized them for faithful reconstruction; and (3) a canonical editor that enables efficient manipulation of semantic masks in canonical view and product high-quality editing results. Our approach is competent for many applications, e.g. free-view face drawing, editing, and style control. Both quantitative and qualitative results show that our method reaches the state-of-the-art in terms of photorealism, faithfulness, and efficiency.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Identity Preservation | Face Images OOD | Accuracy (eyeglasses)88.11 | 8 | |
| Identity Preservation | OOD Face Videos | Eyeglasses Consistency87.67 | 8 | |
| Reconstruction | OOD videos Images | LPIPS0.5044 | 8 | |
| Reconstruction | OOD videos | LPIPS0.4999 | 8 | |
| Novel View Synthesis | CelebA-HQ | ID Similarity67.1 | 7 | |
| 3D-aware Portrait Synthesis | FFHQ 512x512 (train test) | FID4.6 | 5 | |
| 3D-aware Portrait Synthesis | CelebAHQ-Mask 512x512 (test) | FID4.9 | 4 | |
| GAN Inversion | CelebA-HQ 1500 images (test) | PSNR26.45 | 4 |