Text2Mesh: Text-Driven Neural Stylization for Meshes
About
In this work, we develop intuitive controls for editing the style of 3D objects. Our framework, Text2Mesh, stylizes a 3D mesh by predicting color and local geometric details which conform to a target text prompt. We consider a disentangled representation of a 3D object using a fixed mesh input (content) coupled with a learned neural network, which we term neural style field network. In order to modify style, we obtain a similarity score between a text prompt (describing style) and a stylized mesh by harnessing the representational power of CLIP. Text2Mesh requires neither a pre-trained generative model nor a specialized 3D mesh dataset. It can handle low-quality meshes (non-manifold, boundaries, etc.) with arbitrary genus, and does not require UV parameterization. We demonstrate the ability of our technique to synthesize a myriad of styles over a wide variety of 3D meshes.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Avatar Generation | 3D Avatar Generation Benchmark | FID219.6 | 8 | |
| Text-to-3D Generation | 28 text-to-3D prompts | Avg User Preference Rank4.53 | 6 | |
| Global 3D Editing | Evaluation dataset unseen 3D assets (test) | CLIP Similarity0.248 | 6 | |
| Local 3D Editing | Evaluation dataset unseen 3D assets (test) | CLIP Similarity0.239 | 6 | |
| 3D Mesh Stylization | User Study (3D Mesh Stylization) | Overall Quality Score3.9 | 4 | |
| Text-driven 3D stylization | Multi-object 3D scenes | Alignment Score0.262 | 4 |