PRTS: A Primitive Reasoning and Tasking System via Contrastive Representations

About

Vision-Language-Action (VLA) models advance robotic control via strong visual-linguistic priors. However, existing VLAs predominantly frame pretraining as supervised behavior cloning, overlooking the fundamental nature of robot learning as a goal-reaching process that requires understanding temporal task progress. We present \textbf{PRTS} (\textbf{P}rimitive \textbf{R}easoning and \textbf{T}asking \textbf{S}ystem), a VLA foundation model that reformulates pretraining through Goal-Conditioned Reinforcement Learning. By treating language instructions as goals and employing contrastive reinforcement learning, PRTS learns a unified embedding space where the inner product of state-action and goal embeddings approximates the log-discounted goal occupancy, the probability of reaching the language-specified goal from the current state-action, quantitatively assessing physical feasibility beyond static semantic matching. PRTS draws this dense goal-reachability supervision directly from offline trajectories without reward annotations, and folds it into the VLM backbone via a role-aware causal mask, incurring negligible overhead over vanilla behavior cloning. This paradigm endows the high-level reasoning system with intrinsic goal reachability awareness, bridging semantic reasoning and temporal task progress, and further benefits goal-conditioned action prediction. Pretrained on 167B tokens of diverse manipulation and embodied-reasoning data, PRTS reaches state-of-the-art performance on LIBERO, LIBERO-Pro, LIBERO-Plus, SimplerEnv, and a real-world suite of 14 complex tasks, with particularly substantial gains on long-horizon, contact-rich, and zero-shot novel-instruction settings, confirming that injecting goal-reachability awareness significantly improves both execution success and long-horizon planning of general-purpose robotic foundation policies.

Yang Zhang, Jiangyuan Zhao, Chenyou Fan, Fangzheng Yan, Tian Li, Haitong Tang, Sen Fu, Xuan'er Wu, Qizhen Weng, Weinan Zhang, Xiu Li, Chi Zhang, Chenjia Bai, Xuelong Li• 2026

Related benchmarks

Task	Dataset	Result
Robotic Manipulation	LIBERO-Plus	Language Understanding Score89.6	414
Robot Manipulation	SimplerEnv WidowX Visual Matching	Average Success Rate77.1	52
Visuomotor Control	LIBERO	Spatial Score98.8	29
Robotic Manipulation	LIBERO-Pro (test)	Semantic SR97	6
Flip Tennis Tube	RealMan dual-arm platform Real-world	Success Rate90	3
Hand Over	RealMan dual-arm platform Real-world	Success Rate95	3
Office Long Term	RealMan dual-arm platform Real-world	Success Rate95	3
Paper Rubbish	RealMan dual-arm platform Real-world	Success Rate100	3
Pick Shoes	RealMan dual-arm platform Real-world	Success Rate95	3
Serve Tea	RealMan dual-arm platform Real-world	Success Rate95	3

Showing 10 of 15 rows

Other info

Follow for update

@wizwand_team Discord