Self-Improving Vision-Language-Action Models with Data Generation via Residual RL

About

Supervised fine-tuning (SFT) has become the de facto post-training strategy for large vision-language-action (VLA) models, but its reliance on costly human demonstrations limits scalability and generalization. We propose Probe, Learn, Distill (PLD), a three-stage plug-and-play framework that improves VLAs through residual reinforcement learning (RL) and distribution-aware data collection. In Stage 1, we train lightweight residual actors to probe failure regions of the VLA generalist. In Stage 2, we use a hybrid rollout scheme that aligns collected trajectories with the generalist's deployment distribution while capturing recovery behaviors. In Stage 3, we distill the curated trajectories back into the generalist with standard SFT. PLD achieves near-saturated 99% task success on LIBERO, over 50% gains in SimplerEnv, and 100% success on real-world Franka and YAM arm manipulation tasks. Ablations show that residual probing and distribution-aware replay are key to collecting deployment-aligned data that improves both seen and unseen tasks, offering a scalable path toward self-improving VLA models.

Wenli Xiao, Haotian Lin, Andy Peng, Haoru Xue, Tairan He, Yuqi Xie, Fengyuan Hu, Jimmy Wu, Zhengyi Luo, Linxi "Jim" Fan, Guanya Shi, Yuke Zhu• 2025

Related benchmarks

Task	Dataset	Result
Robotic Assembly	AirPods assembly	Grasp Case Success Rate100	10
Sparse-reward manipulation	Square simulated environment	Success Rate84	6
Sparse-reward manipulation	Coffee simulated environment	Success Rate68	6
Sparse-reward manipulation	Mug Cleanup simulated environment	Success Rate46	6
Sparse-reward manipulation	Threading simulated environment	Success Rate0.00e+0	6
Sparse-reward manipulation	Nut Assembly simulated environment	Success Rate0.00e+0	6
Sparse-reward manipulation	Hammer Cleanup simulated environment	Success Rate96	6
Block Assembly	Block Assembly Real-world	Success Rate34.3	5
Ethernet Insertion	Ethernet Insertion Real-world	Success Rate65.7	5
Power Plug Insertion	Power Plug Insertion Real-world	Success Rate45.7	5

Showing 10 of 15 rows

Other info

Follow for update

@wizwand_team Discord