InterMimic: Towards Universal Whole-Body Control for Physics-Based Human-Object Interactions

About

Achieving realistic simulations of humans interacting with a wide range of objects has long been a fundamental goal. Extending physics-based motion imitation to complex human-object interactions (HOIs) is challenging due to intricate human-object coupling, variability in object geometries, and artifacts in motion capture data, such as inaccurate contacts and limited hand detail. We introduce InterMimic, a framework that enables a single policy to robustly learn from hours of imperfect MoCap data covering diverse full-body interactions with dynamic and varied objects. Our key insight is to employ a curriculum strategy -- perfect first, then scale up. We first train subject-specific teacher policies to mimic, retarget, and refine motion capture data. Next, we distill these teachers into a student policy, with the teachers acting as online experts providing direct supervision, as well as high-quality references. Notably, we incorporate RL fine-tuning on the student policy to surpass mere demonstration replication and achieve higher-quality solutions. Our experiments demonstrate that InterMimic produces realistic and diverse interactions across multiple HOI datasets. The learned policy generalizes in a zero-shot manner and seamlessly integrates with kinematic generators, elevating the framework from mere imitation to generative modeling of complex human-object interactions.

Sirui Xu, Hung Yu Ling, Yu-Xiong Wang, Liang-Yan Gui• 2025

Related benchmarks

Task	Dataset	Result
HOI Motion Imitation	GRAB	Success Ratio50	40
Humanoid Loco-manipulation	350 tasks (train)	Success Rate 120.64	10
Humanoid Loco-manipulation	66 unseen tasks (test)	Success Rate 117.22	10
3D Human-Object Interaction Imitation	GRAB	MPJPE (Body)61.11	8
Dynamic Human-Object Interaction Imitation	Dynamic HOI Imitation (test)	SR52.6	7
adaptation to novel object and interaction skills	BEHAVE	Success Rate38.9	4
adaptation to novel object and interaction skills	HODome	Success Rate55.5	4
Multi-agent Human-Object Interaction Motion Imitation	OMOMO	Success Rate7.26	4
full-reference imitation	OMOMO select	SR63.9	2

Showing 9 of 9 rows

Other info

Code

Follow for update

@wizwand_team Discord