Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

TryOnDiffusion: A Tale of Two UNets

About

Given two images depicting a person and a garment worn by another person, our goal is to generate a visualization of how the garment might look on the input person. A key challenge is to synthesize a photorealistic detail-preserving visualization of the garment, while warping the garment to accommodate a significant body pose and shape change across the subjects. Previous methods either focus on garment detail preservation without effective pose and shape variation, or allow try-on with the desired shape and pose but lack garment details. In this paper, we propose a diffusion-based architecture that unifies two UNets (referred to as Parallel-UNet), which allows us to preserve garment details and warp the garment for significant pose and body change in a single network. The key ideas behind Parallel-UNet include: 1) garment is warped implicitly via a cross attention mechanism, 2) garment warp and person blend happen as part of a unified process as opposed to a sequence of two separate tasks. Experimental results indicate that TryOnDiffusion achieves state-of-the-art performance both qualitatively and quantitatively.

Luyang Zhu, Dawei Yang, Tyler Zhu, Fitsum Reda, William Chan, Chitwan Saharia, Mohammad Norouzi, Ira Kemelmacher-Shlizerman• 2023

Related benchmarks

TaskDatasetResultRank
Virtual Try-OnVITON-HD unpaired 1.0 (test)
FID23.352
14
Virtual Try-OnDressCode triplets (test)
FID15.944
6
Video Virtual Try-onOur Dataset Internet Videos v1 (test)
FID95
5
Video Virtual Try-onUBC (test)
FID94
5
Virtual Try-On6K unpaired 1.0 (test)
FID13.447
4
Virtual Try-Onunpaired 6K (Random test)
User Preference Rate92.72
4
Virtual Try-Onunpaired 6K (Challenging test)
User Preference Rate9.58e+3
4
Video Virtual Try-onOur Dataset (test)
Video Smoothness0.03
4
Virtual Try-On8,300 triplets (test)
FID19.459
2
Virtual Try-On1,000 paired (test)
SSIM0.883
2
Showing 10 of 10 rows

Other info

Follow for update