StableVLA: Towards Robust Vision-Language-Action Models without Extra Data

About

It is infeasible to encompass all possible disturbances within the training dataset. This raises a critical question regarding the robustness of Vision-Language-Action (VLA) models when encountering unseen real-world visual disturbances, particularly under imperfect visual conditions. In this work, we conduct a systematic study based on recent state-of-the-art VLA models and reveal a significant performance drop when visual disturbances absent from the training data are introduced. To mitigate this issue, we propose a lightweight adapter module grounded in information theory, termed the Information Bottleneck Adapter (IB-Adapter), which selectively filters potential noise from visual inputs. Without requiring any extra data or augmentation strategies, IB-Adapter consistently improves over the baseline by an average of 30%, while adding fewer than 10M parameters, demonstrating notable efficiency and effectiveness. Furthermore, even with a 14x smaller backbone (0.5B parameters) and no pre-training on the Open X-Embodiment dataset, our model StableVLA achieves robustness competitive with 7B-scale state-of-the-art VLAs. With negligible parameter overhead (<10M), our approach maintains accuracy on long-horizon tasks and surpasses OpenPi under both synthetic and physical visual corruptions.

Yiyang Fu, Chubin Zhang, Shukai Gong, Yufan Deng, Kaiwei Sun, Qiyang Min, Qibin Hou, Yansong Tang, Jianan Wang, Daquan Zhou• 2026

Related benchmarks

Task	Dataset	Result
Robot Manipulation	LIBERO Object	--	139
Robot Manipulation	LIBERO Long	--	44
Language-conditioned robot manipulation	LIBERO-Spatial Severity 3	Success Rate94.4	5
Language-conditioned robot manipulation	LIBERO-Spatial Severity 4	Success Rate92.1	5
Language-conditioned robot manipulation	LIBERO-Spatial Severity 5	Success Rate82	5
Language-conditioned robot manipulation	LIBERO-Goal Clean	Success Rate98	5
Language-conditioned robot manipulation	LIBERO-Goal Severity 4	Success Rate85	5
Language-conditioned robot manipulation	LIBERO-Goal Severity 5	Success Rate71.9	5
Robot Manipulation	LIBERO Goal	Success Rate (C)98	5
Language-conditioned robot manipulation	LIBERO-Spatial Clean	Success Rate96.2	5

Showing 10 of 27 rows

Other info

Follow for update

@wizwand_team Discord