Bringing Your Portrait to 3D Presence
About
We present a unified framework for reconstructing animatable 3D human avatars from a single portrait across head, half-body, and full-body inputs. Our method tackles three bottlenecks: pose- and framing-sensitive feature representations, limited scalable data, and unreliable proxy-mesh estimation. We introduce a Dual-UV representation that maps image features to a canonical UV space via Core-UV and Shell-UV branches, eliminating pose- and framing-induced token shifts. We also build a factorized synthetic data manifold combining 2D generative diversity with geometry-consistent 3D renderings, supported by a training scheme that improves realism and identity consistency. A robust proxy-mesh tracker maintains stability under partial visibility. Together, these components enable strong in-the-wild generalization. Trained only on half-body synthetic data, our model achieves state-of-the-art head and upper-body reconstruction and competitive full-body results. Extensive experiments and analyses further validate the effectiveness of our approach.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Human Reenactment | Upper-Wild (test) | PSNR21.09 | 4 | |
| Human Reenactment | Upper-Wan (test) | PSNR20.38 | 4 | |
| Human Reenactment | Full-Body (test) | PSNR24.53 | 3 | |
| Human Reenactment | Head (test) | PSNR19.04 | 3 |