Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

What Can RL Bring to VLA Generalization? An Empirical Study

About

Large Vision-Language Action (VLA) models have shown significant potential for embodied AI. However, their predominant training via supervised fine-tuning (SFT) limits generalization due to susceptibility to compounding errors under distribution shifts. Reinforcement learning (RL) offers a path to overcome these limitations by optimizing for task objectives via trial-and-error, yet a systematic understanding of its specific generalization benefits for VLAs compared to SFT is lacking. To address this, our study introduces a comprehensive benchmark for evaluating VLA generalization and systematically investigates the impact of RL fine-tuning across diverse visual, semantic, and execution dimensions. Our extensive experiments reveal that RL fine-tuning, particularly with PPO, significantly enhances generalization in semantic understanding and execution robustness over SFT, while maintaining comparable visual robustness. We identify PPO as a more effective RL algorithm for VLAs than LLM-derived methods like DPO and GRPO. We also develop a simple recipe for efficient PPO training on VLAs, and demonstrate its practical utility for improving VLA generalization. The project page is at https://rlvla.github.io

Jijia Liu, Feng Gao, Bingwen Wei, Xinlei Chen, Qingmin Liao, Yi Wu, Chao Yu, Yu Wang• 2025

Related benchmarks

TaskDatasetResultRank
Robot ManipulationSimplerEnv WidowX Robot tasks
Average Success Rate9.25e+3
32
Robotic ManipulationManiSkill3
Average Success Rate70.5
21
Put Eggplant in BasketSimplerEnv WidowX
Success Rate93.7
18
Put Spoon on TowelSimplerEnv WidowX
Success Rate93
18
Stack Green on YellowSimplerEnv WidowX
Success Rate92
18
Put Carrot on PlateSimplerEnv WidowX
Success Rate0.913
18
Robot ManipulationManiSkill (Out-Of-Distribution)
Vision Score74
12
Robot ManipulationManiSkill In-Distribution
Success Rate88.5
9
Robot ManipulationManiSkill Vision Generalization 3 (Unseen simulation tasks)
Success Rate (Table)87.08
8
Robot ManipulationManiSkill Semantics Generalization 3 (Unseen simulation tasks)
Object OOD Score62.5
8
Showing 10 of 13 rows

Other info

Follow for update