Instant Volumetric Head Avatars
About
We present Instant Volumetric Head Avatars (INSTA), a novel approach for reconstructing photo-realistic digital avatars instantaneously. INSTA models a dynamic neural radiance field based on neural graphics primitives embedded around a parametric face model. Our pipeline is trained on a single monocular RGB portrait video that observes the subject under different expressions and views. While state-of-the-art methods take up to several days to train an avatar, our method can reconstruct a digital avatar in less than 10 minutes on modern GPU hardware, which is orders of magnitude faster than previous solutions. In addition, it allows for the interactive rendering of novel poses and expressions. By leveraging the geometry prior of the underlying parametric face model, we demonstrate that INSTA extrapolates to unseen poses. In quantitative and qualitative studies on various subjects, INSTA outperforms state-of-the-art methods regarding rendering quality and training time.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Self-Reenactment | INSTA | PSNR27.85 | 14 | |
| Self-Reenactment | HDTF | PSNR25.03 | 14 | |
| Monocular 3D Head Avatar Creation | NeRSemble | PSNR15.8 | 8 | |
| Head Avatar Rendering | INSTA | Inverse MAE76.4 | 7 | |
| Self-Reenactment | self-captured dataset | PSNR25.91 | 6 | |
| Facial cross-person reenactment | Facial cross-person reenactment dataset | E_feat_cos0.9087 | 5 | |
| Head Avatar Rendering | Monocular video for head avatar | PSNR26.42 | 5 | |
| 3D Head Avatar Reconstruction | Monocular RGB videos (test) | LPIPS0.149 | 5 | |
| Novel expression and view synthesis | NeRSemble (novel expressions and views) | PSNR27.9181 | 5 | |
| Novel View Synthesis | NeRSemble (novel-view split) | PSNR27.7786 | 5 |