Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data

About

Generating diverse and natural human motion sequences based on textual descriptions constitutes a fundamental and challenging research area within the domains of computer vision, graphics, and robotics. Despite significant advancements in this field, current methodologies often face challenges regarding zero-shot generalization capabilities, largely attributable to the limited size of training datasets. Moreover, the lack of a comprehensive evaluation framework impedes the advancement of this task by failing to identify directions for improvement. In this work, we aim to push text-to-motion into a new era, that is, to achieve the generalization ability of zero-shot. To this end, firstly, we develop an efficient annotation pipeline and introduce MotionMillion-the largest human motion dataset to date, featuring over 2,000 hours and 2 million high-quality motion sequences. Additionally, we propose MotionMillion-Eval, the most comprehensive benchmark for evaluating zero-shot motion generation. Leveraging a scalable architecture, we scale our model to 7B parameters and validate its performance on MotionMillion-Eval. Our results demonstrate strong generalization to out-of-domain and complex compositional motions, marking a significant step toward zero-shot human motion generation. The code is available at https://github.com/VankouF/MotionMillion-Codes.

Ke Fan, Shunlin Lu, Minyue Dai, Runyi Yu, Lixing Xiao, Zhiyang Dou, Junting Dong, Lizhuang Ma, Jingbo Wang• 2025

Related benchmarks

Task	Dataset	Result
Text-to-Motion Synthesis	HumanML3D	R-Precision (Top 1)61.6	43
Motion Generation	MBench 16 (official leaderboard)	Jitter Penalty0.012	17
Text-to-motion	MotionHub (test)	R-Precision (T1)29.3	12
Text-to-motion	HumanML3D 10 (test)	R-Precision@144.3	12
Motion Reconstruction	HumanML3D (test)	MPJPE41.9	12
3D Human Motion Generation	Motion-X++ (test)	FID14.47	7
3D Human Motion Generation	CoMoVi Dataset (test)	FID1.641	7
Text-to-motion	Custom Diverse Text-to-Motion 1.0 (test)	Locomotion Score3.11	5
Text-to-motion generation	Text-to-motion 1.0 (test)	Locomotion Score2.8	5
Motion Reconstruction	MotionMillion (test)	MPJPE (mm)45.5	3

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord