Drivable Volumetric Avatars using Texel-Aligned Features
About
Photorealistic telepresence requires both high-fidelity body modeling and faithful driving to enable dynamically synthesized appearance that is indistinguishable from reality. In this work, we propose an end-to-end framework that addresses two core challenges in modeling and driving full-body avatars of real people. One challenge is driving an avatar while staying faithful to details and dynamics that cannot be captured by a global low-dimensional parameterization such as body pose. Our approach supports driving of clothed avatars with wrinkles and motion that a real driving performer exhibits beyond the training corpus. Unlike existing global state representations or non-parametric screen-space approaches, we introduce texel-aligned features -- a localised representation which can leverage both the structural prior of a skeleton-based parametric model and observed sparse image signals at the same time. Another challenge is modeling a temporally coherent clothed avatar, which typically requires precise surface tracking. To circumvent this, we propose a novel volumetric avatar representation by extending mixtures of volumetric primitives to articulated objects. By explicitly incorporating articulation, our approach naturally generalizes to unseen poses. We also introduce a localized viewpoint conditioning, which leads to a large improvement in generalization of view-dependent appearance. The proposed volumetric representation does not require high-quality mesh tracking as a prerequisite and brings significant quality improvements compared to mesh-based counterparts. In our experiments, we carefully examine our design choices and demonstrate the efficacy of our approach, outperforming the state-of-the-art methods on challenging driving scenarios.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Novel View Synthesis | ZJU-MoCap (test) | SSIM0.956 | 43 | |
| Human Novel View Synthesis | ZJU-MoCap | PSNR29.45 | 31 | |
| Novel View Synthesis | MonoCap (test) | PSNR32.99 | 17 | |
| Human Novel View Synthesis | DNA-Rendering | PSNR29.8 | 7 | |
| Human Novel-view Rendering | S22 4K | PSNR31.2019 | 6 | |
| Novel Pose Synthesis | DynaCap Subject S1 - tight clothing 8 (test) | PSNR30.6 | 6 | |
| Novel Pose Synthesis | DNA-Rendering (Novel poses) | PSNR28.8 | 6 | |
| Novel View Synthesis | DNA-Rendering (Novel views) | PSNR29.8 | 6 | |
| Human Novel-view Rendering | S3 4K | PSNR29.3681 | 6 | |
| Human Novel-view Rendering | S22 1K | PSNR33.7842 | 6 |