What Can RL Bring to VLA Generalization? An Empirical Study
About
Large Vision-Language Action (VLA) models have shown significant potential for embodied AI. However, their predominant training via supervised fine-tuning (SFT) limits generalization due to susceptibility to compounding errors under distribution shifts. Reinforcement learning (RL) offers a path to overcome these limitations by optimizing for task objectives via trial-and-error, yet a systematic understanding of its specific generalization benefits for VLAs compared to SFT is lacking. To address this, our study introduces a comprehensive benchmark for evaluating VLA generalization and systematically investigates the impact of RL fine-tuning across diverse visual, semantic, and execution dimensions. Our extensive experiments reveal that RL fine-tuning, particularly with PPO, significantly enhances generalization in semantic understanding and execution robustness over SFT, while maintaining comparable visual robustness. We identify PPO as a more effective RL algorithm for VLAs than LLM-derived methods like DPO and GRPO. We also develop a simple recipe for efficient PPO training on VLAs, and demonstrate its practical utility for improving VLA generalization. The project page is at https://rlvla.github.io
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Robot Manipulation | SimplerEnv WidowX Robot tasks | Average Success Rate9.25e+3 | 26 | |
| Put Eggplant in Basket | SimplerEnv WidowX | Success Rate93.7 | 18 | |
| Put Spoon on Towel | SimplerEnv WidowX | Success Rate93 | 18 | |
| Stack Green on Yellow | SimplerEnv WidowX | Success Rate92 | 18 | |
| Put Carrot on Plate | SimplerEnv WidowX | Success Rate0.913 | 18 | |
| Robotic Manipulation | ManiSkill3 | Stack Cube Success Rate64 | 15 | |
| Robot Manipulation | ManiSkill Vision Generalization 3 (Unseen simulation tasks) | Success Rate (Table)87.08 | 8 | |
| Robot Manipulation | ManiSkill Semantics Generalization 3 (Unseen simulation tasks) | Object OOD Score62.5 | 8 | |
| Robot Manipulation | ManiSkill Execution Generalization 3 (Unseen simulation tasks) | Object Position Error82.08 | 8 |