Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

StableVLA: Towards Robust Vision-Language-Action Models without Extra Data

About

It is infeasible to encompass all possible disturbances within the training dataset. This raises a critical question regarding the robustness of Vision-Language-Action (VLA) models when encountering unseen real-world visual disturbances, particularly under imperfect visual conditions. In this work, we conduct a systematic study based on recent state-of-the-art VLA models and reveal a significant performance drop when visual disturbances absent from the training data are introduced. To mitigate this issue, we propose a lightweight adapter module grounded in information theory, termed the Information Bottleneck Adapter (IB-Adapter), which selectively filters potential noise from visual inputs. Without requiring any extra data or augmentation strategies, IB-Adapter consistently improves over the baseline by an average of 30%, while adding fewer than 10M parameters, demonstrating notable efficiency and effectiveness. Furthermore, even with a 14x smaller backbone (0.5B parameters) and no pre-training on the Open X-Embodiment dataset, our model StableVLA achieves robustness competitive with 7B-scale state-of-the-art VLAs. With negligible parameter overhead (<10M), our approach maintains accuracy on long-horizon tasks and surpasses OpenPi under both synthetic and physical visual corruptions.

Yiyang Fu, Chubin Zhang, Shukai Gong, Yufan Deng, Kaiwei Sun, Qiyang Min, Qibin Hou, Yansong Tang, Jianan Wang, Daquan Zhou• 2026

Related benchmarks

TaskDatasetResultRank
Robot ManipulationLIBERO Object--
127
Robot ManipulationLIBERO Long--
35
Language-conditioned robot manipulationLIBERO-Spatial Severity 3
Success Rate94.4
5
Language-conditioned robot manipulationLIBERO-Spatial Severity 4
Success Rate92.1
5
Language-conditioned robot manipulationLIBERO-Spatial Severity 5
Success Rate82
5
Language-conditioned robot manipulationLIBERO-Goal Clean
Success Rate98
5
Language-conditioned robot manipulationLIBERO-Goal Severity 4
Success Rate85
5
Language-conditioned robot manipulationLIBERO-Goal Severity 5
Success Rate71.9
5
Robot ManipulationLIBERO Goal
Success Rate (C)98
5
Language-conditioned robot manipulationLIBERO-Spatial Clean
Success Rate96.2
5
Showing 10 of 27 rows

Other info

Follow for update