Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Audio to Body Dynamics

About

We present a method that gets as input an audio of violin or piano playing, and outputs a video of skeleton predictions which are further used to animate an avatar. The key idea is to create an animation of an avatar that moves their hands similarly to how a pianist or violinist would do, just from audio. Aiming for a fully detailed correct arms and fingers motion is a goal, however, it's not clear if body movement can be predicted from music at all. In this paper, we present the first result that shows that natural body dynamics can be predicted at all. We built an LSTM network that is trained on violin and piano recital videos uploaded to the Internet. The predicted points are applied onto a rigged avatar to create the animation.

Eli Shlizerman, Lucio M. Dery, Hayden Schoen, Ira Kemelmacher-Shlizerman• 2017

Related benchmarks

TaskDatasetResultRank
Speech to gesture translationSpeech2Gesture 1.0 (test)
Fooled Rate (%)10
12
Speech to gesture translationSpeech2Gesture Meyers 1.0 (test)
Percentage Fooled0.375
6
Speech to gesture translationSpeech2Gesture Oliver 1.0 (test)
Percentage Fooled18.2
6
Conversational Gesture SynthesisUser Study Conversational Gestures (test)
Naturalness3.15
5
Lip motion predictionSpeech2Gesture Subjects v1 (test)
Oliver Prediction Error (mm)0.3
5
Music-to-Dance Generation71-hour music-to-dance dataset 1.0
FID73.8
5
Showing 6 of 6 rows

Other info

Follow for update