Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

M$^3$GPT: An Advanced Multimodal, Multitask Framework for Motion Comprehension and Generation

About

This paper presents M$^3$GPT, an advanced $\textbf{M}$ultimodal, $\textbf{M}$ultitask framework for $\textbf{M}$otion comprehension and generation. M$^3$GPT operates on three fundamental principles. The first focuses on creating a unified representation space for various motion-relevant modalities. We employ discrete vector quantization for multimodal conditional signals, such as text, music and motion/dance, enabling seamless integration into a large language model (LLM) with a single vocabulary. The second involves modeling motion generation directly in the raw motion space. This strategy circumvents the information loss associated with a discrete tokenizer, resulting in more detailed and comprehensive motion generation. Third, M$^3$GPT learns to model the connections and synergies among various motion-relevant tasks. Text, the most familiar and well-understood modality for LLMs, is utilized as a bridge to establish connections between different motion tasks, facilitating mutual reinforcement. To our knowledge, M$^3$GPT is the first model capable of comprehending and generating motions based on multiple signals. Extensive experiments highlight M$^3$GPT's superior performance across various motion-relevant tasks and its powerful zero-shot generalization capabilities for extremely challenging tasks. Project page: \url{https://github.com/luomingshuang/M3GPT}.

Mingshuang Luo, Ruibing Hou, Zhuo Li, Hong Chang, Zimo Liu, Yaowei Wang, Shiguang Shan• 2024

Related benchmarks

TaskDatasetResultRank
Music-to-Dance GenerationFineDance
BAS0.2231
23
Music-to-DanceAIST++
FIDk23.01
17
Text-to-motionMotion-X
R TOP166.1
17
Dance-to-MusicAIST++
BCS94.3
17
Motion In-betweenMotion-X
FID0.604
15
Motion PredictionMotion-X
FID0.682
15
Motion-to-TextMotion-X
RPrecision Top384.6
14
Motion In-betweenMotion-X
MPJPE51
4
Motion PredictionMotion-X
MPJPE54.2
4
Dance-to-MusicFineDance (test)
BCS84.84
3
Showing 10 of 13 rows

Other info

Code

Follow for update