ViViD: Video Virtual Try-on using Diffusion Models

About

Video virtual try-on aims to transfer a clothing item onto the video of a target person. Directly applying the technique of image-based try-on to the video domain in a frame-wise manner will cause temporal-inconsistent outcomes while previous video-based try-on solutions can only generate low visual quality and blurring results. In this work, we present ViViD, a novel framework employing powerful diffusion models to tackle the task of video virtual try-on. Specifically, we design the Garment Encoder to extract fine-grained clothing semantic features, guiding the model to capture garment details and inject them into the target video through the proposed attention feature fusion mechanism. To ensure spatial-temporal consistency, we introduce a lightweight Pose Encoder to encode pose signals, enabling the model to learn the interactions between clothing and human posture and insert hierarchical Temporal Modules into the text-to-image stable diffusion model for more coherent and lifelike video synthesis. Furthermore, we collect a new dataset, which is the largest, with the most diverse types of garments and the highest resolution for the task of video virtual try-on to date. Extensive experiments demonstrate that our approach is able to yield satisfactory video try-on results. The dataset, codes, and weights will be publicly available. Project page: https://becauseimbatman0.github.io/ViViD.

Zixun Fang, Wei Zhai, Aimin Su, Hongliang Song, Kai Zhu, Mao Wang, Yu Chen, Zhiheng Liu, Yang Cao, Zheng-Jun Zha• 2024

Related benchmarks

Task	Dataset	Result
Video Virtual Try-on	VVT (test)	SSIM0.949	17
Video Virtual Try-on	ViViD (test)	SSIM0.846	13
Video Virtual Try-on	ViT-HD	VFID (I^p)19.0568	7
Video Virtual Try-on	VVT 11 (test)	VFID^p_I3.793	7
Video Virtual Try-on	TripVVT-Bench 1.0 (test)	VFID (Image)26.762	5
Video Virtual Try-on	TripVVT-Bench	Rank-1 Accuracy2.6	5
Video Virtual Try-on	ViViD-S (test)	VFID (I)21.8032	5
Interactive Video Virtual Try-On	VVT-Interact (paired)	FVD468.5	4
Interactive Video Virtual Try-On	VVT-Interact (unpaired)	FVD482.2	4

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord