Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MG-MotionLLM: A Unified Framework for Motion Comprehension and Generation across Multiple Granularities

About

Recent motion-aware large language models have demonstrated promising potential in unifying motion comprehension and generation. However, existing approaches primarily focus on coarse-grained motion-text modeling, where text describes the overall semantics of an entire motion sequence in just a few words. This limits their ability to handle fine-grained motion-relevant tasks, such as understanding and controlling the movements of specific body parts. To overcome this limitation, we pioneer MG-MotionLLM, a unified motion-language model for multi-granular motion comprehension and generation. We further introduce a comprehensive multi-granularity training scheme by incorporating a set of novel auxiliary tasks, such as localizing temporal boundaries of motion segments via detailed text as well as motion detailed captioning, to facilitate mutual reinforcement for motion-text modeling across various levels of granularity. Extensive experiments show that our MG-MotionLLM achieves superior performance on classical text-to-motion and motion-to-text tasks, and exhibits potential in novel fine-grained motion comprehension and editing tasks. Project page: CVI-SZU/MG-MotionLLM

Bizhu Wu, Jinheng Xie, Keming Shen, Zhe Kong, Jianfeng Ren, Ruibin Bai, Rong Qu, Linlin Shen• 2025

Related benchmarks

TaskDatasetResultRank
Text-to-motion generationHumanML3D (test)
FID0.303
481
text-to-motion mappingHumanML3D (test)
FID0.303
283
Text-to-motion generationHumanML3D
R-Precision (Top 1)51.6
64
Text-driven Motion GenerationHumanML3D (test)
R-Precision@151.6
54
Text-to-Motion SynthesisHumanML3D
R-Precision (Top 1)51.6
43
Motion-to-TextHumanML3D (test)
BLEU@48.1
40
Text-to-motion generationHumanML3D 1 (test)
R-Precision (Top 1)0.516
32
Text-to-motion retrievalHumanML3D
Recall@380.2
14
Speed-based motion generationAnyContext (test)
R@128.1
10
Trajectory-based motion generationAnyContext (test)
R@10.193
10
Showing 10 of 14 rows

Other info

Follow for update