DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion

About

We present DreamPose, a diffusion-based method for generating animated fashion videos from still images. Given an image and a sequence of human body poses, our method synthesizes a video containing both human and fabric motion. To achieve this, we transform a pretrained text-to-image model (Stable Diffusion) into a pose-and-image guided video synthesis model, using a novel fine-tuning strategy, a set of architectural changes to support the added conditioning signals, and techniques to encourage temporal consistency. We fine-tune on a collection of fashion videos from the UBC Fashion dataset. We evaluate our method on a variety of clothing styles and poses, and demonstrate that our method produces state-of-the-art results on fashion video animation.Video results are available on our project page.

Johanna Karras, Aleksander Holynski, Ting-Chun Wang, Ira Kemelmacher-Shlizerman• 2023

Related benchmarks

Task	Dataset	Result
Human Dance Generation	Tiktok (test)	SSIM0.511	17
Fashion Video Generation	UBC Fashion (test)	LPIPS0.068	10
2D Character Animation	TikTok dancing dataset	PSNR28.01	7
2D Character Animation	TED-talks dataset	FVD140.1	6

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord