StableGarment: Garment-Centric Generation via Stable Diffusion

About

In this paper, we introduce StableGarment, a unified framework to tackle garment-centric(GC) generation tasks, including GC text-to-image, controllable GC text-to-image, stylized GC text-to-image, and robust virtual try-on. The main challenge lies in retaining the intricate textures of the garment while maintaining the flexibility of pre-trained Stable Diffusion. Our solution involves the development of a garment encoder, a trainable copy of the denoising UNet equipped with additive self-attention (ASA) layers. These ASA layers are specifically devised to transfer detailed garment textures, also facilitating the integration of stylized base models for the creation of stylized images. Furthermore, the incorporation of a dedicated try-on ControlNet enables StableGarment to execute virtual try-on tasks with precision. We also build a novel data engine that produces high-quality synthesized data to preserve the model's ability to follow prompts. Extensive experiments demonstrate that our approach delivers state-of-the-art (SOTA) results among existing virtual try-on methods and exhibits high flexibility with broad potential applications in various garment-centric image generation.

Rui Wang, Hailong Guo, Jiaming Liu, Huaxia Li, Haibo Zhao, Xu Tang, Yao Hu, Hao Tang, Peipei Li• 2024

Related benchmarks

Task	Dataset	Result
Virtual Try-On	VITON-HD paired	LPIPS0.115	36
Virtual Try-On	DressCode	LPIPS0.0641	19
model-free try-on	Omni-TryOn (test)	DINO-I39.49	11
model-based try-on	VITON-HD	FID9.5081	9
model-free try-on	VITON-HD	DINO-I0.371	6
Multi-Garment Virtual Dressing	DressCode multi-garment (test)	FID104.4	6
Virtual Dressing	DressCode single-garment (test)	FID79.495	6
Virtual Dressing	VITON-HD single-garment (test)	FID57.876	6
Multi-garment dressing	Dressing-Pair (test)	CLIP-T Score0.284	5
Single-garment dressing	VITON-HD (test)	CLIP-T Score0.285	5

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord