Everybody Dance Now

About

This paper presents a simple method for "do as I do" motion transfer: given a source video of a person dancing, we can transfer that performance to a novel (amateur) target after only a few minutes of the target subject performing standard moves. We approach this problem as video-to-video translation using pose as an intermediate representation. To transfer the motion, we extract poses from the source subject and apply the learned pose-to-appearance mapping to generate the target subject. We predict two consecutive frames for temporally coherent video results and introduce a separate pipeline for realistic face synthesis. Although our method is quite simple, it produces surprisingly compelling results (see video). This motivates us to also provide a forensics tool for reliable synthetic content detection, which is able to distinguish videos synthesized by our system from real data. In addition, we release a first-of-its-kind open-source dataset of videos that can be legally used for training and motion transfer.

Caroline Chan, Shiry Ginosar, Tinghui Zhou, Alexei A. Efros• 2018

Related benchmarks

Task	Dataset	Result
Sign Language Video Generation	RWTH-PHOENIX-Weather 2014T (test)	SSIM73.7	10
Sign Language Video Generation	ASL production dataset	SSIM0.737	7
Video Reenactment	Personal story dataset (test)	Image Error (IE)1.75	7

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord