Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on

About

We present OOTDiffusion, a novel network architecture for realistic and controllable image-based virtual try-on (VTON). We leverage the power of pretrained latent diffusion models, designing an outfitting UNet to learn the garment detail features. Without a redundant warping process, the garment features are precisely aligned with the target human body via the proposed outfitting fusion in the self-attention layers of the denoising UNet. In order to further enhance the controllability, we introduce outfitting dropout to the training process, which enables us to adjust the strength of the garment features through classifier-free guidance. Our comprehensive experiments on the VITON-HD and Dress Code datasets demonstrate that OOTDiffusion efficiently generates high-quality try-on results for arbitrary human and garment images, which outperforms other VTON methods in both realism and controllability, indicating an impressive breakthrough in virtual try-on. Our source code is available at https://github.com/levihsu/OOTDiffusion.

Yuhao Xu, Tao Gu, Weifeng Chen, Chengcai Chen• 2024

Related benchmarks

TaskDatasetResultRank
Virtual Try-OnVITON-HD (test)
SSIM85.13
48
Virtual Try-OnVITON-HD 1.0 (test)
FID6.5186
27
Virtual Try-OnDressCode (test)
FID3.9497
23
Virtual Try-OnVITON paired HD (test)
FID9.3
19
Virtual Try-OnDressCode 1.0 (test)
FID3.9497
14
Image Virtual Try-onVITON-HD
LPIPS0.0876
14
Virtual Try-OnVITON-HD unpaired 1.0 (test)
FID12.41
14
Virtual Try-OnStreetTryOn Shop-to-Street
FID42.318
13
Virtual Try-OnDressCode Upper (unpaired and paired)
FIDu25.84
13
Virtual Try-OnDressCode Dresses (unpaired and paired)
FIDu52.229
13
Showing 10 of 24 rows

Other info

Follow for update