TADA! Text to Animatable Digital Avatars

About

We introduce TADA, a simple-yet-effective approach that takes textual descriptions and produces expressive 3D avatars with high-quality geometry and lifelike textures, that can be animated and rendered with traditional graphics pipelines. Existing text-based character generation methods are limited in terms of geometry and texture quality, and cannot be realistically animated due to inconsistent alignment between the geometry and the texture, particularly in the face region. To overcome these limitations, TADA leverages the synergy of a 2D diffusion model and an animatable parametric body model. Specifically, we derive an optimizable high-resolution body model from SMPL-X with 3D displacements and a texture map, and use hierarchical rendering with score distillation sampling (SDS) to create high-quality, detailed, holistic 3D avatars from text. To ensure alignment between the geometry and texture, we render normals and RGB images of the generated character and exploit their latent embeddings in the SDS training process. We further introduce various expression parameters to deform the generated character during training, ensuring that the semantics of our generated character remain consistent with the original SMPL-X model, resulting in an animatable character. Comprehensive evaluations demonstrate that TADA significantly surpasses existing approaches on both qualitative and quantitative measures. TADA enables creation of large-scale digital character assets that are ready for animation and rendering, while also being easily editable through natural language. The code will be public for research purposes.

Tingting Liao, Hongwei Yi, Yuliang Xiu, Jiaxaing Tang, Yangyi Huang, Justus Thies, Michael J. Black• 2023

Related benchmarks

Task	Dataset	Result
Avatar Generation	30 custom dressed avatar descriptions 1.0 (test)	BLIP VQA53.06	9
3D Human Generation	User Study 30 prompts	Q1 Best Preference Rate16.91	8
Text-to-3D Human Generation	30 prompt set Stable Diffusion V1.5 1.0 (test)	FID120	7
3D Avatar Generation	User Study 50 text prompts (test)	Semantic Alignment2.89	6
Text-to-3D Avatar Generation	50 Text Prompts	BLIP-VQA0.5	6
3D Human Avatar Generation	3D Human Generation Evaluation Set (test)	Facial Detail2.21	6
Text-to-3D Human Generation	DreamHuman 30 prompts Frontal View (test)	CLIP Score30.13	5
Text-to-3D Human Generation	DreamHuman (30 randomly sampled prompts)	Texture Quality3.76	5
3D Face Editing	Synthesized 3D faces (editing)	Score33.73	4
3D Face Generation	Synthesized 3D faces (generation)	Score34.85	4

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord