When Robots Obey the Patch: Universal Transferable Patch Attacks on Vision-Language-Action Models

About

Vision-Language-Action (VLA) models are vulnerable to adversarial attacks, yet universal and transferable attacks remain underexplored, as most existing patches overfit to a single model and fail in black-box settings. To address this gap, we present a systematic study of universal, transferable adversarial patches against VLA-driven robots under unknown architectures, finetuned variants, and sim-to-real shifts. We introduce UPA-RFAS (Universal Patch Attack via Robust Feature, Attention, and Semantics), a unified framework that learns a single physical patch in a shared feature space while promoting cross-model transfer. UPA-RFAS combines (i) a feature-space objective with an $\ell_1$ deviation prior and repulsive InfoNCE loss to induce transferable representation shifts, (ii) a robustness-augmented two-phase min-max procedure where an inner loop learns invisible sample-wise perturbations and an outer loop optimizes the universal patch against this hardened neighborhood, and (iii) two VLA-specific losses: Patch Attention Dominance to hijack text$\to$vision attention and Patch Semantic Misalignment to induce image-text mismatch without labels. Experiments across diverse VLA models, manipulation suites, and physical executions show that UPA-RFAS consistently transfers across models, tasks, and viewpoints, exposing a practical patch-based attack surface and establishing a strong baseline for future defenses.

Hui Lu, Yi Yu, Yiming Yang, Chenyu Yi, Qixin Zhang, Bingquan Shen, Alex C. Kot, Xudong Jiang• 2025

Related benchmarks

Task	Dataset	Result
Robot Manipulation	LIBERO simulation	Average Success Rate0.5	73
Robot Manipulation	LIBERO Simulated 1.0 (test)	Spatial Success Rate91	24
Robot Manipulation	LIBERO Physical 1.0 (test)	Spatial Success Rate93	24
Robotic Manipulation	LIBERO Physical	Spatial Success Rate0.00e+0	9

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord