MotionAnymesh: Physics-Grounded Articulation for Simulation-Ready Digital Twins
About
Converting static 3D meshes into interactable articulated assets is crucial for embodied AI and robotic simulation. However, existing zero-shot pipelines struggle with complex assets due to a critical lack of physical grounding. Specifically, ungrounded Vision-Language Models (VLMs) frequently suffer from kinematic hallucinations, while unconstrained joint estimation inevitably leads to catastrophic mesh inter-penetration during physical simulation. To bridge this gap, we propose MotionAnymesh, an automated zero-shot framework that seamlessly transforms unstructured static meshes into simulation-ready digital twins. Our method features a kinematic-aware part segmentation module that grounds VLM reasoning with explicit SP4D physical priors, effectively eradicating kinematic hallucinations. Furthermore, we introduce a geometry-physics joint estimation pipeline that combines robust type-aware initialization with physics-constrained trajectory optimization to rigorously guarantee collision-free articulation. Extensive experiments demonstrate that MotionAnymesh significantly outperforms state-of-the-art baselines in both geometric precision and dynamic physical executability, providing highly reliable assets for downstream applications.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Joint parameter estimation | Diverse 3D Assets (test) | Type Error0.08 | 6 | |
| Part Segmentation | Diverse 3D Assets (test) | mIoU0.86 | 6 | |
| Physical Executability | Diverse 3D Assets (test) | Executability87 | 6 |