Towards Long-Lived Robots: Continual Learning VLA Models via Reinforcement Fine-Tuning

About

Pretrained on large-scale and diverse datasets, VLA models demonstrate strong generalization and adaptability as general-purpose robotic policies. However, Supervised Fine-Tuning (SFT), which serves as the primary mechanism for adapting VLAs to downstream domains, requires substantial amounts of task-specific data and is prone to catastrophic forgetting. To address these limitations, we propose LifeLong-RFT, a simple yet effective Reinforcement Fine-Tuning (RFT) strategy for VLA models independent of online environmental feedback and pre-trained reward models. By integrating chunking-level on-policy reinforcement learning with the proposed multi-dimensional process reward mechanism, LifeLong-RFT quantifies the heterogeneous contributions of intermediate action chunks across three dimensions to facilitate policy optimization. Specifically, (1) the Quantized Action Consistency Reward (QACR) ensures accurate action prediction within the discrete action space; (2) the Continuous Trajectory Alignment Reward (CTAR) aligns decoded continuous action chunks with reference trajectories to ensure precise control; (3) the Format Compliance Reward (FCR) guarantees the structural validity of outputs. Comprehensive experiments across SimplerEnv, LIBERO, and real-world tasks demonstrate that LifeLong-RFT exhibits strong performance in multi-task learning. Furthermore, for continual learning on the LIBERO benchmark, our method achieves a 22% gain in average success rate over SFT, while effectively adapting to new tasks using only 20% of the training data. Overall, our method provides a promising post-training paradigm for VLAs. The project page is available at <https://yuan-liu-lifelong-rft.github.io>.

Yuan Liu, Haoran Li, Shuai Tian, Yuxing Qin, Yuhui Chen, Yupeng Zheng, Yongzhen Huang, Dongbin Zhao• 2026

Related benchmarks

Task	Dataset	Result
Robot Manipulation	SimplerEnv WidowX Robot tasks (test)	Success Rate (Spoon)84.3	79
Robot Manipulation	SimplerEnv Google Robot tasks Visual Matching	Pick Coke Can Success Rate94	62
Multi-task Learning	LIBERO	Object Score99.2	18
Continual Learning	LIBERO Object	FWT96	15
Continual Learning	LIBERO Goal	FWT92.4	8
Continual Learning	LIBERO Spatial	FWT94	6
Continual Learning	LIBERO Long	Forward Transfer (FWT)74.2	6
Continual Learning	Real-world	FWT80	4
Hang Chinese Knot	Real-world 1.0 (test)	Success Rate75	4
Multi-Task Learning (Overall)	Real-world 1.0 (test)	Success Rate87.5	4

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord