Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

X-VLA: Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model

About

Successful generalist Vision-Language-Action (VLA) models rely on effective training across diverse robotic platforms with large-scale, cross-embodiment, heterogeneous datasets. To facilitate and leverage the heterogeneity in rich, diverse robotic data sources, we propose a novel Soft Prompt approach with minimally added parameters, by infusing prompt learning concepts into cross-embodiment robot learning and introducing separate sets of learnable embeddings for each distinct data source. These embeddings serve as embodiment-specific prompts, which in unity empower VLA models with effective exploitation of varying cross-embodiment features. Our new X-VLA, a neat flow-matching-based VLA architecture, relies exclusively on soft-prompted standard Transformer encoders, enjoying both scalability and simplicity. Evaluated across 6 simulations as well as 3 real-world robots, our 0.9B instantiation-X-VLA-0.9B simultaneously achieves SOTA performance over a sweep of benchmarks, demonstrating superior results on a wide axes of capabilities, from flexible dexterity to quick adaptation across embodiments, environments, and tasks. Website: https://thu-air-dream.github.io/X-VLA/

Jinliang Zheng, Jianxiong Li, Zhihao Wang, Dongxiu Liu, Xirui Kang, Yuchun Feng, Yinan Zheng, Jiayin Zou, Yilun Chen, Jia Zeng, Ya-Qin Zhang, Jiangmiao Pang, Jingjing Liu, Tai Wang, Xianyuan Zhan• 2025

Related benchmarks

TaskDatasetResultRank
Robot ManipulationLIBERO
Goal Achievement97.8
494
Robot ManipulationRoboTwin Randomized 2.0
Success Rate: Place Dual Shoes88
20
Robot ManipulationRoboTwin Clean 2.0
Place Dual Shoes Success79
20
Robot ManipulationLIBERO-Plus Zero-shot
Camera Score22.2
20
Robotic ManipulationWidowX
Spoon Success Rate100
17
Robotic ManipulationGoogle Robot Variant Aggregation
Pick Success Rate85.5
15
Language-conditioned manipulationLIBERO Long
Avg Success Score97.6
6
Bimanual Robotic ManipulationRoboTwin Hard 2.0
Success Rate (H=1)82.5
5
Bimanual Robotic ManipulationRoboTwin Easy 2.0
Success Rate (H=1)81.6
5
Robotic ManipulationGenieSim 2.2
Success Rate: Clear Countertop Waste62.2
4
Showing 10 of 10 rows

Other info

Follow for update