AIpparel: A Multimodal Foundation Model for Digital Garments
About
Apparel is essential to human life, offering protection, mirroring cultural identities, and showcasing personal style. Yet, the creation of garments remains a time-consuming process, largely due to the manual work involved in designing them. To simplify this process, we introduce AIpparel, a multimodal foundation model for generating and editing sewing patterns. Our model fine-tunes state-of-the-art large multimodal models (LMMs) on a custom-curated large-scale dataset of over 120,000 unique garments, each with multimodal annotations including text, images, and sewing patterns. Additionally, we propose a novel tokenization scheme that concisely encodes these complex sewing patterns so that LLMs can learn to predict them efficiently. AIpparel achieves state-of-the-art performance in single-modal tasks, including text-to-garment and image-to-garment prediction, and enables novel multimodal garment generation applications such as interactive garment editing. The project website is at https://georgenakayama.github.io/AIpparel/.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| image-to-garment prediction | FTAG Hit Reaction sequence (test) | CD57.6 | 4 | |
| image-to-garment prediction | FTAG Northern Spin sequence (test) | CD257 | 4 | |
| image-to-garment prediction | FTAG Average across sequences (test) | CD114.9 | 4 | |
| image-to-garment prediction | FTAG Jumping Jack sequence (test) | CD98.1 | 4 | |
| image-to-garment prediction | FTAG Joyful Jump sequence (test) | CD46.8 | 4 | |
| Clothing reconstruction | 4D-Dress Lower (test) | CD380 | 4 | |
| Clothing reconstruction | 4D-Dress Upper (test) | Chamfer Distance380 | 4 | |
| Sewing Pattern Panel Quality Estimation | GCD (test) | IoU0.834 | 2 | |
| Stitching prediction | GCD | F1 Score82.1 | 2 |