Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MetaVLA: Unified Meta Co-training For Efficient Embodied Adaption

About

Vision-Language-Action (VLA) models show promise in embodied reasoning, yet remain far from true generalists-they often require task-specific fine-tuning, incur high compute costs, and generalize poorly to unseen tasks. We propose MetaVLA, a unified, backbone-agnostic post-training framework for efficient and scalable alignment. MetaVLA introduces Context-Aware Meta Co-Training, which consolidates diverse target tasks into a single fine-tuning stage while leveraging structurally diverse auxiliary tasks to improve in-domain generalization. Unlike naive multi-task SFT, MetaVLA integrates a lightweight meta-learning mechanism-derived from Attentive Neural Processes-to enable rapid adaptation from diverse contexts with minimal architectural change or inference overhead. On the LIBERO benchmark, MetaVLA with six auxiliary tasks outperforms OpenVLA by up to 8.0% on long-horizon tasks, reduces training steps from 240K to 75K, and cuts GPU time by ~76%. These results show that scalable, low-resource post-training is achievable-paving the way toward general-purpose embodied agents. Code will be available.

Chen Li, Zhantao Yang, Han Zhang, Fangyi Chen, Chenchen Zhu, Anudeepsekhar Bolimera, Marios Savvides• 2025

Related benchmarks

TaskDatasetResultRank
Robot ManipulationLIBERO Object
Success Rate87
127
Robot ManipulationLIBERO
Spatial Success Rate88
116
Robotic ManipulationLIBERO Long
Success Rate55
91
Robotic ManipulationLIBERO Goal
Success Rate77
42
Robotic ManipulationLIBERO Average across suites
Success Rate (SR)76
29
Robotic ManipulationLIBERO Spatial
Success Rate (SR)85
28
Showing 6 of 6 rows

Other info

Follow for update