StarVLA-$\alpha$: Reducing Complexity in Vision-Language-Action Systems
About
Vision-Language-Action (VLA) models have recently emerged as a promising paradigm for building general-purpose robotic agents. However, the VLA landscape remains highly fragmented and complex: as existing approaches vary substantially in architectures, training data, embodiment configurations, and benchmark-specific engineering. In this work, we introduce StarVLA-$\alpha$, a simple yet strong baseline designed to study VLA design choices under controlled conditions. StarVLA-$\alpha$ deliberately minimizes architectural and pipeline complexity to reduce experimental confounders and enable systematic analysis. Specifically, we re-evaluate several key design axes, including action modeling strategies, robot-specific pretraining, and interface engineering. Across unified multi-benchmark training on LIBERO, SimplerEnv, RoboTwin, and RoboCasa, the same simple baseline remains highly competitive, indicating that a strong VLM backbone combined with minimal design is already sufficient to achieve strong performance without relying on additional architectural complexity or engineering tricks. Notably, our single generalist model outperforms $\pi_{0.5}$ by 20\% on the public real-world RoboChallenge benchmark. We expect StarVLA-$\alpha$ to serve as a solid starting point for future research in the VLA regime. Code will be released at https://github.com/starVLA/starVLA.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Robotic Manipulation | LIBERO | Spatial Success Rate99 | 314 | |
| Robotic Manipulation | LIBERO-Plus | Average Score79.7 | 107 | |
| Robotic Manipulation | RoboTwin 2.0 | -- | 64 | |
| Robot Manipulation | RoboCasa-GR1 24 tasks | Average Success Rate57.3 | 10 | |
| Robotic Manipulation | SimplerEnv | WidowX Score65.2 | 7 | |
| arrange flowers | Table30 RoboChallenge ARX5 | Success Rate40 | 6 | |
| arrange paper cups | RoboChallenge | Success Rate (SR)20 | 3 | |
| fold dishcloth | RoboChallenge | Success Rate0.00e+0 | 3 | |
| place shoes on rack | RoboChallenge | Success Rate (SR)50 | 3 | |
| put cup on coaster | RoboChallenge | Success Rate100 | 3 |