Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SONIC: Supersizing Motion Tracking for Natural Humanoid Whole-Body Control

About

Despite the rise of billion-parameter foundation models trained across thousands of GPUs, similar scaling gains have not been shown for humanoid control. Current neural controllers for humanoids remain modest in size, target a limited set of behaviors, and are trained on a handful of GPUs. We show that scaling model capacity, data, and compute yields a generalist humanoid controller capable of natural, robust whole-body movements. We position motion tracking as a scalable task for humanoid control, leveraging dense supervision from diverse motion-capture data to acquire human motion priors without manual reward engineering. We build a foundation model for motion tracking by scaling along three axes: network size (1.2M to 42M parameters), dataset volume (100M+ frames from 700 hours of motion capture), and compute (21k GPU hours). Beyond demonstrating the benefits of scale, we further show downstream utility through: (1) a real-time kinematic planner bridging motion tracking to tasks such as navigation, enabling natural and interactive control, and (2) a unified token space supporting VR teleoperation and vision-language-action (VLA) models with a single policy. Through this interface, we demonstrate autonomous VLA-driven whole-body loco-manipulation requiring coordinated hand and foot placement. Scaling motion tracking exhibits favorable properties: performance improves steadily with compute and data diversity, and learned policies generalize to unseen motions, establishing motion tracking at scale as a practical foundation for humanoid control.

Zhengyi Luo, Ye Yuan, Tingwu Wang, Chenran Li, Fernando Casta\~neda, Sirui Chen, Zi-Ang Cao, Jiefeng Li, David Minor, Qingwei Ben, Jinhyung Park, David Sami, Zi Wang, Xingye Da, Runyu Ding, Cyrus Hogg, Lina Song, Edy Lim, Eugene Jeong, Tairan He, Haoru Xue, Wenli Xiao, Simon Yuen, Jan Kautz, Yan Chang, Umar Iqbal, Linxi "Jim" Fan, Yuke Zhu• 2025

Related benchmarks

TaskDatasetResultRank
Fall-and-recovery evaluationFall-and-recovery sequences lie-to-stand, prone-to-stand, and stand-to-lie (test)
CR42.8
8
Humanoid motion trackingMuJoCo 101 held-out motion sequences (test)
CR (%)79.3
8
Human-to-robot clip-level retrievalDPAE (val)
R@197.8
6
Robot-to-robot clip-level retrievalDPAE (val)
R@197.2
6
Robot-to-human clip-level retrievalDPAE (val)
R@197
6
Motion TrackingDiverse Static and Dynamic Motions 2001 sequences
Success Rate1.88e+3
5
Humanoid motion trackingMuJoCo evaluation suite (out-of-domain)
Empkpe227.9
4
Showing 7 of 7 rows

Other info

Follow for update