StarVLA: A Lego-like Codebase for Vision-Language-Action Model Developing

About

Building generalist embodied agents requires integrating perception, language understanding, and action, which are core capabilities addressed by Vision-Language-Action (VLA) approaches based on multimodal foundation models, including recent advances in vision-language models and world models. Despite rapid progress, VLA methods remain fragmented across incompatible architectures, codebases, and evaluation protocols, hindering principled comparison and reproducibility. We present StarVLA, an open-source codebase for VLA research. StarVLA addresses these challenges in three aspects. First, it provides a modular backbone--action-head architecture that supports both VLM backbones (e.g., Qwen-VL) and world-model backbones (e.g., Cosmos) alongside representative action-decoding paradigms, all under a shared abstraction in which backbone and action head can each be swapped independently. Second, it provides reusable training strategies, including cross-embodiment learning and multimodal co-training, that apply consistently across supported paradigms. Third, it integrates major benchmarks, including LIBERO, SimplerEnv, RoboTwin~2.0, RoboCasa-GR1, and BEHAVIOR-1K, through a unified evaluation interface that supports both simulation and real-robot deployment. StarVLA also ships simple, fully reproducible single-benchmark training recipes that, despite minimal data engineering, already match or surpass prior methods on multiple benchmarks with both VLM and world-model backbones. To our best knowledge, StarVLA is one of the most comprehensive open-source VLA frameworks available, and we expect it to lower the barrier for reproducing existing methods and prototyping new ones. StarVLA is being actively maintained and expanded; we will update this report as the project evolves. The code and documentation are available at https://github.com/starVLA/starVLA.

StarVLA Community• 2026

Related benchmarks

Task	Dataset	Result
Robot Manipulation	LIBERO	Object Achievement98.6	957
Robotic Manipulation	LIBERO	Spatial Success Rate98.9	527
Robotic Manipulation	LIBERO-Plus	Language Understanding Score81.8	249
Robotic Manipulation	RoboTwin 2.0	--	100
Robot Manipulation	SimplerEnv Google Robot tasks Variant Aggregation	Average Success Rate70.2	88
Robot Manipulation	SimplerEnv Google Robot Visual Matching	Pick Coke Can95.3	65
Robotic Manipulation	LIBERO	Spatial Success Rate98	52
Robotic Manipulation	SIMPLER Visual Matching WidowX robot	Put Spoon on Towel Score90.3	51
Robot Manipulation	RoboTwin Clean 2.0	--	39
Robot Manipulation	SimplerEnv WidowX Visual Matching	Average Success Rate65.3	34

Showing 10 of 31 rows

Other info

Follow for update

@wizwand_team Discord