AvatarTex: High-Fidelity Facial Texture Reconstruction from Single-Image Stylized Avatars
About
We present AvatarTex, a high-fidelity facial texture reconstruction framework capable of generating both stylized and photorealistic textures from a single image. Existing methods struggle with stylized avatars due to the lack of diverse multi-style datasets and challenges in maintaining geometric consistency in non-standard textures. To address these limitations, AvatarTex introduces a novel three-stage diffusion-to-GAN pipeline. Our key insight is that while diffusion models excel at generating diversified textures, they lack explicit UV constraints, whereas GANs provide a well-structured latent space that ensures style and topology consistency. By integrating these strengths, AvatarTex achieves high-quality topology-aligned texture synthesis with both artistic and geometric coherence. Specifically, our three-stage pipeline first completes missing texture regions via diffusion-based inpainting, refines style and structure consistency using GAN-based latent optimization, and enhances fine details through diffusion-based repainting. To address the need for a stylized texture dataset, we introduce TexHub, a high-resolution collection of 20,000 multi-style UV textures with precise UV-aligned layouts. By leveraging TexHub and our structured diffusion-to-GAN pipeline, AvatarTex establishes a new state-of-the-art in multi-style facial texture reconstruction. TexHub will be released upon publication to facilitate future research in this field.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Reconstruction | FFHQ (test) | -- | 36 | |
| Facial Texture Reconstruction | FFHQ 1,000 images (test) | PSNR30.03 | 4 | |
| Facial Texture Reconstruction | LPFF 1,000 images (test) | PSNR27.91 | 4 | |
| Facial Texture Reconstruction | CANVAS 500 samples (test) | PSNR23.93 | 4 |