Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

StarVLA-$\alpha$: Reducing Complexity in Vision-Language-Action Systems

About

Vision-Language-Action (VLA) models have recently emerged as a promising paradigm for building general-purpose robotic agents. However, the VLA landscape remains highly fragmented and complex: as existing approaches vary substantially in architectures, training data, embodiment configurations, and benchmark-specific engineering. In this work, we introduce StarVLA-$\alpha$, a simple yet strong baseline designed to study VLA design choices under controlled conditions. StarVLA-$\alpha$ deliberately minimizes architectural and pipeline complexity to reduce experimental confounders and enable systematic analysis. Specifically, we re-evaluate several key design axes, including action modeling strategies, robot-specific pretraining, and interface engineering. Across unified multi-benchmark training on LIBERO, SimplerEnv, RoboTwin, and RoboCasa, the same simple baseline remains highly competitive, indicating that a strong VLM backbone combined with minimal design is already sufficient to achieve strong performance without relying on additional architectural complexity or engineering tricks. Notably, our single generalist model outperforms $\pi_{0.5}$ by 20\% on the public real-world RoboChallenge benchmark. We expect StarVLA-$\alpha$ to serve as a solid starting point for future research in the VLA regime. Code will be released at https://github.com/starVLA/starVLA.

Jinhui Ye, Ning Gao, Senqiao Yang, Jinliang Zheng, Zixuan Wang, Yuxin Chen, Pengguang Chen, Yilun Chen, Shu Liu, Jiaya Jia• 2026

Related benchmarks

TaskDatasetResultRank
Robotic ManipulationLIBERO
Spatial Success Rate99
314
Robotic ManipulationLIBERO-Plus
Average Score79.7
107
Robotic ManipulationRoboTwin 2.0--
64
Robot ManipulationRoboCasa-GR1 24 tasks
Average Success Rate57.3
10
Robotic ManipulationSimplerEnv
WidowX Score65.2
7
arrange flowersTable30 RoboChallenge ARX5
Success Rate40
6
arrange paper cupsRoboChallenge
Success Rate (SR)20
3
fold dishclothRoboChallenge
Success Rate0.00e+0
3
place shoes on rackRoboChallenge
Success Rate (SR)50
3
put cup on coasterRoboChallenge
Success Rate100
3
Showing 10 of 17 rows

Other info

Follow for update