Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

UniJEPA: Enhancing Robot Policy via Unified Continuous and Discrete Representation Learning

About

Building generalist robot policies that can handle diverse tasks in open-ended environments is a central challenge in robotics. To leverage knowledge from large-scale pretraining, prior work (VLA) has typically built generalist policies either on top of vision-language understanding models (VLMs) or generative models. However, both semantic understanding from vision-language pretraining and visual dynamics modeling from visual-generation pretraining are crucial for embodied robots. Recent unified models of generation and understanding have demonstrated strong capabilities in both comprehension and generation through large-scale pretraining. We posit that robotic policy learning can likewise benefit from the combined strengths of understanding, planning, and continuous future representation learning. Building on this insight, we introduce UniJEPA, which acquires the ability to dynamically model high-dimensional visual features through pretraining on over 1M internet-scale instructional manipulation videos. Subsequently, UniJEPA is fine-tuned on data collected from the robot embodiment, enabling the learning of mappings from predictive representations to action tokens. Extensive experiments show our approach consistently outperforms baseline methods in terms of 9\% and 12\% across simulation environments and real-world out-of-distribution tasks.

Jianke Zhang, Yucheng Hu, Yanjiang Guo, Xiaoyu Chen, Yichen Liu, Wenna Chen, Chaochao Lu, Jianyu Chen• 2025

Related benchmarks

TaskDatasetResultRank
Robot ManipulationSimplerEnv Google Robot Visual Matching
Pick Coke Can98.7
65
Long-horizon language-conditioned manipulationCalvin ABC->D
Success Rate (Seq 1)97.3
12
Robot ManipulationFranka Panda Seen Tasks
Success Rate88
10
Robot ManipulationFranka Panda (Unseen Tasks)
Success Rate80
10
Robotic ManipulationSimplerEnv-WindowsX visual matching
Carrot on Plate Final Success Rate63
10
Drawer OperationFranka-Emika Panda Robotarm (Unseen)
Success Rate80
6
Pick-&-PlaceFranka-Emika Panda Robotarm (Unseen)
Success Rate85
6
Press ButtonFranka-Emika Panda Robotarm (Seen)
Success Rate95
6
Route CableFranka-Emika Panda Robotarm (Seen)
Success Rate80
6
Route CableFranka-Emika Panda Robotarm (Unseen)
Success Rate75
6
Showing 10 of 15 rows

Other info

Follow for update