Test-Time Perturbation Learning with Delayed Feedback for Vision-Language-Action Models

About

Vision-Language-Action models (VLAs) achieve remarkable performance in sequential decision-making but remain fragile to subtle environmental shifts, such as small changes in object pose. We attribute this brittleness to trajectory overfitting, where VLAs over-attend to the spurious correlation between actions and entities, then reproduce memorized action patterns. We propose Perturbation learning with Delayed Feedback (PDF), a verifier-free test-time adaptation framework that improves decision performance without fine-tuning the base model. PDF mitigates the spurious correlation through uncertainty-based data augmentation and action voting, while an adaptive scheduler allocates augmentation budgets to balance performance and efficiency. To further improve stability, PDF learns a lightweight perturbation module that retrospectively adjusts action logits guided by delayed feedback, correcting overconfidence issue. Experiments on LIBERO (+7.4\% success rate) and Atari (+10.3 human normalized score) demonstrate consistent gains of PDF in task success over vanilla VLA and VLA with test-time adaptation, establishing a practical path toward reliable test-time adaptation in multimodal decision-making agents. The code is available at \href{https://github.com/zhoujiahuan1991/CVPR2026-PDF}{https://github.com/zhoujiahuan1991/CVPR2026-PDF}.

Zehua Zang, Xi Wang, Fuchun Sun, Xiao Xu, Lixiang Lium, Jiahuan Zhou, Jiangmeng Li• 2026

Related benchmarks

Task	Dataset	Result
Robot Manipulation	LIBERO Object	Success Rate72	139
Robotic Manipulation	LIBERO Long	Success Rate59	97
Robotic Manipulation	LIBERO Goal	Success Rate86	55
Robotic Manipulation	LIBERO Average across suites	Success Rate (SR)77	29
Robotic Manipulation	LIBERO Spatial	Success Rate (SR)90	28

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord