AIpparel: A Multimodal Foundation Model for Digital Garments
About
Apparel is essential to human life, offering protection, mirroring cultural identities, and showcasing personal style. Yet, the creation of garments remains a time-consuming process, largely due to the manual work involved in designing them. To simplify this process, we introduce AIpparel, a multimodal foundation model for generating and editing sewing patterns. Our model fine-tunes state-of-the-art large multimodal models (LMMs) on a custom-curated large-scale dataset of over 120,000 unique garments, each with multimodal annotations including text, images, and sewing patterns. Additionally, we propose a novel tokenization scheme that concisely encodes these complex sewing patterns so that LLMs can learn to predict them efficiently. AIpparel achieves state-of-the-art performance in single-modal tasks, including text-to-garment and image-to-garment prediction, and enables novel multimodal garment generation applications such as interactive garment editing. The project website is at https://georgenakayama.github.io/AIpparel/.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| image-to-garment prediction | FTAG Hit Reaction sequence (test) | CD57.6 | 4 | |
| image-to-garment prediction | FTAG Northern Spin sequence (test) | CD257 | 4 | |
| image-to-garment prediction | FTAG Average across sequences (test) | CD114.9 | 4 | |
| Sewing-pattern generation | GCD-MM 1.0 (test) | Vertex L2 Distance4.8 | 4 | |
| image-to-garment prediction | FTAG Jumping Jack sequence (test) | CD98.1 | 4 | |
| image-to-garment prediction | FTAG Joyful Jump sequence (test) | CD46.8 | 4 | |
| Clothing reconstruction | 4D-Dress Lower (test) | CD380 | 4 | |
| Clothing reconstruction | 4D-Dress Upper (test) | Chamfer Distance380 | 4 | |
| Single-view sewing pattern recovery | Photorealistic Synthetic Image 18 Whole-body Garment GPT-4o (test) | Chamfer Distance6.09 | 3 | |
| Sewing-pattern editing | GCD-MM 1.0 (test) | Vertex L22.5 | 3 |