Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

StarVLA: A Lego-like Codebase for Vision-Language-Action Model Developing

About

Building generalist embodied agents requires integrating perception, language understanding, and action, which are core capabilities addressed by Vision-Language-Action (VLA) approaches based on multimodal foundation models, including recent advances in vision-language models and world models. Despite rapid progress, VLA methods remain fragmented across incompatible architectures, codebases, and evaluation protocols, hindering principled comparison and reproducibility. We present StarVLA, an open-source codebase for VLA research. StarVLA addresses these challenges in three aspects. First, it provides a modular backbone--action-head architecture that supports both VLM backbones (e.g., Qwen-VL) and world-model backbones (e.g., Cosmos) alongside representative action-decoding paradigms, all under a shared abstraction in which backbone and action head can each be swapped independently. Second, it provides reusable training strategies, including cross-embodiment learning and multimodal co-training, that apply consistently across supported paradigms. Third, it integrates major benchmarks, including LIBERO, SimplerEnv, RoboTwin~2.0, RoboCasa-GR1, and BEHAVIOR-1K, through a unified evaluation interface that supports both simulation and real-robot deployment. StarVLA also ships simple, fully reproducible single-benchmark training recipes that, despite minimal data engineering, already match or surpass prior methods on multiple benchmarks with both VLM and world-model backbones. To our best knowledge, StarVLA is one of the most comprehensive open-source VLA frameworks available, and we expect it to lower the barrier for reproducing existing methods and prototyping new ones. StarVLA is being actively maintained and expanded; we will update this report as the project evolves. The code and documentation are available at https://github.com/starVLA/starVLA.

StarVLA Community• 2026

Related benchmarks

TaskDatasetResultRank
Robotic ManipulationLIBERO
Spatial Success Rate98.9
314
Robot ManipulationSimplerEnv Google Robot tasks Variant Aggregation
Average Success Rate70.2
67
Robotic ManipulationSIMPLER Visual Matching WidowX robot
Put Spoon on Towel Score90.3
51
Robot ManipulationSimplerEnv Google Robot Visual Matching
Pick Coke Can95.3
43
Robot ManipulationRoboTwin Clean 2.0--
24
Robot ManipulationRoboTwin Randomized 2.0--
20
Robot ManipulationRoboCasa-GR1 24 tasks
Average Success Rate48.8
10
Robot ManipulationRoboTwin
Success Rate (Click Bell)71
6
Showing 8 of 8 rows

Other info

Follow for update