Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

UniVLA: Learning to Act Anywhere with Task-centric Latent Actions

About

A generalist robot should perform effectively across various environments. However, most existing approaches heavily rely on scaling action-annotated data to enhance their capabilities. Consequently, they are often limited to single physical specification and struggle to learn transferable knowledge across different embodiments and environments. To confront these limitations, we propose UniVLA, a new framework for learning cross-embodiment vision-language-action (VLA) policies. Our key innovation is to derive task-centric action representations from videos with a latent action model. This enables us to exploit extensive data across a wide spectrum of embodiments and perspectives. To mitigate the effect of task-irrelevant dynamics, we incorporate language instructions and establish a latent action model within the DINO feature space. Learned from internet-scale videos, the generalist policy can be deployed to various robots through efficient latent action decoding. We obtain state-of-the-art results across multiple manipulation and navigation benchmarks, as well as real-robot deployments. UniVLA achieves superior performance over OpenVLA with less than 1/20 of pretraining compute and 1/10 of downstream data. Continuous performance improvements are observed as heterogeneous data, even including human videos, are incorporated into the training pipeline. The results underscore UniVLA's potential to facilitate scalable and efficient robot policy learning.

Qingwen Bu, Yanting Yang, Jisong Cai, Shenyuan Gao, Guanghui Ren, Maoqing Yao, Ping Luo, Hongyang Li• 2025

Related benchmarks

TaskDatasetResultRank
Robot ManipulationLIBERO
Object Achievement98.8
957
Robotic ManipulationLIBERO
Spatial Success Rate96.5
527
Robotic ManipulationLIBERO-Plus
Language Understanding Score71.8
249
Robot ManipulationLIBERO (test)
Average Success Rate95.5
220
Robotic ManipulationCalvin ABCD→D
Avg Length3.8
130
Robot ManipulationLIBERO Object
Success Rate96.8
127
Long-horizon robot manipulationCalvin ABCD→D
Task 1 Completion Rate95.5
127
Robot ManipulationSimplerEnv WidowX
Success Rate: Put Spoon on Towel83.3
98
Robotic ManipulationLIBERO Long
Success Rate92
91
Robotic ManipulationLIBERO v1 (test)
Average Success Rate95.2
83
Showing 10 of 99 rows
...

Other info

Code

Follow for update