Human-3Diffusion: Realistic Avatar Creation via Explicit 3D Consistent Diffusion Models
About
Creating realistic avatars from a single RGB image is an attractive yet challenging problem. Due to its ill-posed nature, recent works leverage powerful prior from 2D diffusion models pretrained on large datasets. Although 2D diffusion models demonstrate strong generalization capability, they cannot provide multi-view shape priors with guaranteed 3D consistency. We propose Human 3Diffusion: Realistic Avatar Creation via Explicit 3D Consistent Diffusion. Our key insight is that 2D multi-view diffusion and 3D reconstruction models provide complementary information for each other, and by coupling them in a tight manner, we can fully leverage the potential of both models. We introduce a novel image-conditioned generative 3D Gaussian Splats reconstruction model that leverages the priors from 2D multi-view diffusion models, and provides an explicit 3D representation, which further guides the 2D reverse sampling process to have better 3D consistency. Experiments show that our proposed framework outperforms state-of-the-art methods and enables the creation of realistic avatars from a single RGB image, achieving high-fidelity in both geometry and appearance. Extensive ablations also validate the efficacy of our design, (1) multi-view 2D priors conditioning in generative 3D reconstruction and (2) consistency refinement of sampling trajectory via the explicit 3D representation. Our code and models will be released on https://yuxuan-xue.com/human-3diffusion.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Human Texture Reconstruction | CustomHuman | LPIPS (Front)0.0569 | 21 | |
| Human Texture Reconstruction | THuman 3.0 | LPIPS (Front)0.054 | 21 | |
| Human Geometry Reconstruction | CustomHuman 16 | CD: P-to-S (cm)1.481 | 16 | |
| Human Geometry Reconstruction | THuman3.0 49 | CD: P-to-S (cm)1.331 | 16 | |
| 3D human reconstruction | THuman 2.1 (test) | PSNR17.329 | 16 | |
| Single-view human reconstruction | CustomHuman (test) | NC58.72 | 15 | |
| 3D human reconstruction | Computational Efficiency Evaluation | Inference Time2 | 13 | |
| 3D human reconstruction | CustomHuman | PSNR33.75 | 9 | |
| 3D human reconstruction | 2K2K | PSNR29.05 | 9 | |
| 3D human reconstruction | 4D-DRESS | Chamfer Distance4.275 | 9 |