Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

X-VLA: Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model

About

Successful generalist Vision-Language-Action (VLA) models rely on effective training across diverse robotic platforms with large-scale, cross-embodiment, heterogeneous datasets. To facilitate and leverage the heterogeneity in rich, diverse robotic data sources, we propose a novel Soft Prompt approach with minimally added parameters, by infusing prompt learning concepts into cross-embodiment robot learning and introducing separate sets of learnable embeddings for each distinct data source. These embeddings serve as embodiment-specific prompts, which in unity empower VLA models with effective exploitation of varying cross-embodiment features. Our new X-VLA, a neat flow-matching-based VLA architecture, relies exclusively on soft-prompted standard Transformer encoders, enjoying both scalability and simplicity. Evaluated across 6 simulations as well as 3 real-world robots, our 0.9B instantiation-X-VLA-0.9B simultaneously achieves SOTA performance over a sweep of benchmarks, demonstrating superior results on a wide axes of capabilities, from flexible dexterity to quick adaptation across embodiments, environments, and tasks. Website: https://thu-air-dream.github.io/X-VLA/

Jinliang Zheng, Jianxiong Li, Zhihao Wang, Dongxiu Liu, Xirui Kang, Yuchun Feng, Yinan Zheng, Jiayin Zou, Yilun Chen, Jia Zeng, Ya-Qin Zhang, Jiangmiao Pang, Jingjing Liu, Tai Wang, Xianyuan Zhan• 2025

Related benchmarks

TaskDatasetResultRank
Robot ManipulationLIBERO
Goal Achievement97.8
700
Robotic ManipulationLIBERO
Spatial Success Rate97.8
314
Robot ManipulationLIBERO (test)
Average Success Rate98.1
184
Robotic ManipulationCalvin ABCD→D
Avg Length4.151
89
Robot ManipulationLIBERO-Plus Zero-shot
Camera Score22.2
28
Robot ManipulationRoboTwin Clean 2.0
Place Dual Shoes Success79
24
Robot ManipulationSimpler-Bridge v1 (test)
Success Rate (Spoon)100
21
Robot ManipulationRoboTwin Randomized 2.0
Success Rate: Place Dual Shoes88
20
Robotic ManipulationWISER (train)
Grasp Success Rate100
18
Robotic ManipulationWISER (test)
Grasp Success44
18
Showing 10 of 23 rows

Other info

Follow for update