Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Test-Time Perturbation Learning with Delayed Feedback for Vision-Language-Action Models

About

Vision-Language-Action models (VLAs) achieve remarkable performance in sequential decision-making but remain fragile to subtle environmental shifts, such as small changes in object pose. We attribute this brittleness to trajectory overfitting, where VLAs over-attend to the spurious correlation between actions and entities, then reproduce memorized action patterns. We propose Perturbation learning with Delayed Feedback (PDF), a verifier-free test-time adaptation framework that improves decision performance without fine-tuning the base model. PDF mitigates the spurious correlation through uncertainty-based data augmentation and action voting, while an adaptive scheduler allocates augmentation budgets to balance performance and efficiency. To further improve stability, PDF learns a lightweight perturbation module that retrospectively adjusts action logits guided by delayed feedback, correcting overconfidence issue. Experiments on LIBERO (+7.4\% success rate) and Atari (+10.3 human normalized score) demonstrate consistent gains of PDF in task success over vanilla VLA and VLA with test-time adaptation, establishing a practical path toward reliable test-time adaptation in multimodal decision-making agents. The code is available at \href{https://github.com/zhoujiahuan1991/CVPR2026-PDF}{https://github.com/zhoujiahuan1991/CVPR2026-PDF}.

Zehua Zang, Xi Wang, Fuchun Sun, Xiao Xu, Lixiang Lium, Jiahuan Zhou, Jiangmeng Li• 2026

Related benchmarks

TaskDatasetResultRank
Robot ManipulationLIBERO Object
Success Rate72
127
Robotic ManipulationLIBERO Long
Success Rate59
91
Robotic ManipulationLIBERO Goal
Success Rate86
42
Robotic ManipulationLIBERO Average across suites
Success Rate (SR)77
29
Robotic ManipulationLIBERO Spatial
Success Rate (SR)90
28
Showing 5 of 5 rows

Other info

Follow for update