TGRPO :Fine-tuning Vision-Language-Action Model via Trajectory-wise Group Relative Policy Optimization

About

Visual-Language-Action (VLA) models have demonstrated strong cross-scenario generalization capabilities in various robotic tasks through large-scale pre-training and task-specific fine-tuning. However, their training paradigm mainly relies on manually collected successful demonstrations, making it difficult to adapt to complex environments when encountering out-of-distribution (OOD) scenarios or execution biases. While Reinforcement Learning (RL) provides a closed-loop optimization framework via active trial-and-error mechanism, it suffers from sparse rewards, high variance, and unstable optimization in long-horizon robotic tasks. To address these limitations, we propose Trajectory-based Group Relative Policy Optimization (TGRPO), an online RL-based training framework for VLA models. TGRPO leverages task analysis generated by a large language model to automatically construct dense reward functions, providing fine-grained feedback to accelerate convergence and improve credit assignment. The core of our method is a group-based strategy that samples and normalizes multiple trajectories in parallel, reducing variance through relative comparison. By integrating trajectory-level and step-level advantage estimation, TGRPO captures both global and local optimization signals without relying on a value network. Experiments on four task categories of the LIBERO benchmark demonstrate that TGRPO achieves an average success rate of 80.7\%, which is 4.2\% higher than that of Supervised Fine-Tuning (SFT) and outperforms other representative RL-based post-training methods.

Zengjue Chen, Runliang Niu, He Kong, Qi Wang, Qianli Xing, Zipei Fan• 2025

Related benchmarks

Task	Dataset	Result
Robot Manipulation	LIBERO	Object Achievement92.2	1025
Robot Manipulation	LIBERO	Spatial Success Rate90	223
Robot Manipulation	LIBERO Object	Success Rate89	139
Robotic Manipulation	LIBERO Long	Success Rate59.2	97
Robotic Manipulation	LIBERO Goal	Success Rate79	55
Robotic Manipulation	LIBERO Average across suites	Success Rate (SR)76	29
Robotic Manipulation	LIBERO Spatial	Success Rate (SR)84	28
Multi-task Learning	LIBERO	Object Score92.2	25

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord