Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MetaWorld: Skill Transfer and Composition in a Hierarchical World Model for Grounding High-Level Instructions

About

Humanoid robot loco-manipulation remains constrained by the semantic-physical gap. Current methods face three limitations: Low sample efficiency in reinforcement learning, poor generalization in imitation learning, and physical inconsistency in VLMs. We propose MetaWorld, a hierarchical world model that integrates semantic planning and physical control via expert policy transfer. The framework decouples tasks into a VLM-driven semantic layer and a latent dynamics model operating in a compact state space. Our dynamic expert selection and motion prior fusion mechanism leverages a pre-trained multi-expert policy library as transferable knowledge, enabling efficient online adaptation via a two-stage framework. VLMs serve as semantic interfaces, mapping instructions to executable skills and bypassing symbol grounding. Experiments on Humanoid-Bench show MetaWorld outperforms world model-based RL in task completion and motion coherence. Our code will be found at https://anonymous.4open.science/r/metaworld-2BF4/

Yutong Shen, Hangxu Liu, Kailin Pei, Ruizhe Xia, Tongtong Feng• 2026

Related benchmarks

TaskDatasetResultRank
LocomotionHumanoid-Bench Stand (test)
Return793.4
3
LocomotionHumanoid-Bench Walk (test)
Return701.2
3
LocomotionHumanoid-Bench Run (test)
Return1.69e+3
3
ManipulationHumanoid-Bench Door (test)
Return680
3
Robot Control AggregateHumanoid-Bench Average (test)
Return966.1
3
Showing 5 of 5 rows

Other info

Follow for update