Beyond Imitation: Reinforcement Learning-Based Sim-Real Co-Training for VLA Models

About

Simulation offers a scalable and low-cost way to enrich vision-language-action (VLA) training, reducing reliance on expensive real-robot demonstrations. However, most sim-real co-training methods rely on supervised fine-tuning (SFT), which treats simulation as a static source of demonstrations and does not exploit large-scale closed-loop interaction. Consequently, real-world gains and generalization are often limited. In this paper, we propose an \underline{\textit{RL}}-based sim-real \underline{\textit{Co}}-training \modify{(RL-Co)} framework that leverages interactive simulation while preserving real-world capabilities. Our method follows a generic two-stage design: we first warm-start the policy with SFT on a mixture of real and simulated demonstrations, then fine-tune it with reinforcement learning in simulation while adding an auxiliary supervised loss on real-world data to anchor the policy and mitigate catastrophic forgetting. We evaluate our framework on four real-world tabletop manipulation tasks using two representative VLA architectures, OpenVLA and $\pi_{0.5}$, and observe consistent improvements over real-only fine-tuning and SFT-based co-training, including +24% real-world success on OpenVLA and +20% on $\pi_{0.5}$. Beyond higher success rates, RL co-training yields stronger generalization to unseen task variations and substantially improved real-world data efficiency, providing a practical and scalable pathway for leveraging simulation to enhance real-robot deployment.

Liangzhi Shi, Shuaihang Chen, Feng Gao, Yinuo Chen, Kang Chen, Tonghe Zhang, Hongzhi Zang, Weinan Zhang, Chao Yu, Yu Wang• 2026

Related benchmarks

Task	Dataset	Result
Close Drawer	Real-world Tabletop Manipulation Close Drawer	Success Rate100	6
open drawer	Real-world Tabletop Manipulation Open Drawer	Success Rate65	6
Pick-&-Place	Real-world Tabletop Manipulation Pick and Place	Success Rate (SR)81.3	6
Push Cube	Real-world Tabletop Manipulation Push Cube	Success Rate68.3	6
Pick-&-Place	Pick and Place (In-Distribution)	Success Rate (SR)81.3	3
Pick-&-Place	Pick and Place Unseen Objects (Out-of-Distribution)	SR56.3	3
Pick-&-Place	Pick and Place Unseen States (Out-of-Distribution)	SR70	3

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord