TEMOS: Generating diverse human motions from textual descriptions

About

We address the problem of generating diverse 3D human motions from textual descriptions. This challenging task requires joint modeling of both modalities: understanding and extracting useful human-centric information from the text, and then generating plausible and realistic sequences of human poses. In contrast to most previous work which focuses on generating a single, deterministic, motion from a textual description, we design a variational approach that can produce multiple diverse human motions. We propose TEMOS, a text-conditioned generative model leveraging variational autoencoder (VAE) training with human motion data, in combination with a text encoder that produces distribution parameters compatible with the VAE latent space. We show the TEMOS framework can produce both skeleton-based animations as in prior work, as well more expressive SMPL body motions. We evaluate our approach on the KIT Motion-Language benchmark and, despite being relatively straightforward, demonstrate significant improvements over the state of the art. Code and models are available on our webpage.

Mathis Petrovich, Michael J. Black, G\"ul Varol• 2022

Related benchmarks

Task	Dataset	Result
Text-to-motion generation	HumanML3D (test)	FID3.734	553
text-to-motion mapping	HumanML3D (test)	FID3.734	283
text-to-motion mapping	KIT-ML (test)	R Precision (Top 3)67	275
Text-to-motion generation	KIT-ML (test)	FID3.717	206
Text-to-motion generation	HumanML3D	FID3.734	91
Text-to-Motion Synthesis	KIT-ML	R Precision Top 368.7	58
Text-to-motion	KIT-ML	R@368.7	44
Motion-to-text retrieval	KIT-ML (test)	R@141.88	41
Interactive Motion Synthesis	InterHuman (test)	R Precision (Top 1)22.4	37
Motion-to-text retrieval	HumanML3D (test)	R@139.96	33

Showing 10 of 30 rows

Other info

Code

Follow for update

@wizwand_team Discord