Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

StarVLA: A Lego-like Codebase for Vision-Language-Action Model Developing

About

Building generalist embodied agents requires integrating perception, language understanding, and action, which are core capabilities addressed by Vision-Language-Action (VLA) approaches based on multimodal foundation models, including recent advances in vision-language models and world models. Despite rapid progress, VLA methods remain fragmented across incompatible architectures, codebases, and evaluation protocols, hindering principled comparison and reproducibility. We present StarVLA, an open-source codebase for VLA research. StarVLA addresses these challenges in three aspects. First, it provides a modular backbone--action-head architecture that supports both VLM backbones (e.g., Qwen-VL) and world-model backbones (e.g., Cosmos) alongside representative action-decoding paradigms, all under a shared abstraction in which backbone and action head can each be swapped independently. Second, it provides reusable training strategies, including cross-embodiment learning and multimodal co-training, that apply consistently across supported paradigms. Third, it integrates major benchmarks, including LIBERO, SimplerEnv, RoboTwin~2.0, RoboCasa-GR1, and BEHAVIOR-1K, through a unified evaluation interface that supports both simulation and real-robot deployment. StarVLA also ships simple, fully reproducible single-benchmark training recipes that, despite minimal data engineering, already match or surpass prior methods on multiple benchmarks with both VLM and world-model backbones. To our best knowledge, StarVLA is one of the most comprehensive open-source VLA frameworks available, and we expect it to lower the barrier for reproducing existing methods and prototyping new ones. StarVLA is being actively maintained and expanded; we will update this report as the project evolves. The code and documentation are available at https://github.com/starVLA/starVLA.

StarVLA Community• 2026

Related benchmarks

TaskDatasetResultRank
Robot ManipulationLIBERO
Object Achievement98.6
957
Robotic ManipulationLIBERO
Spatial Success Rate98.9
527
Robotic ManipulationLIBERO-Plus
Language Understanding Score81.8
249
Robotic ManipulationRoboTwin 2.0--
100
Robot ManipulationSimplerEnv Google Robot tasks Variant Aggregation
Average Success Rate70.2
88
Robot ManipulationSimplerEnv Google Robot Visual Matching
Pick Coke Can95.3
65
Robotic ManipulationLIBERO
Spatial Success Rate98
52
Robotic ManipulationSIMPLER Visual Matching WidowX robot
Put Spoon on Towel Score90.3
51
Robot ManipulationRoboTwin Clean 2.0--
39
Robot ManipulationSimplerEnv WidowX Visual Matching
Average Success Rate65.3
34
Showing 10 of 31 rows

Other info

Follow for update