Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

LoLA: Long Horizon Latent Action Learning for General Robot Manipulation

About

The capability of performing long-horizon, language-guided robotic manipulation tasks critically relies on leveraging historical information and generating coherent action sequences. However, such capabilities are often overlooked by existing Vision-Language-Action (VLA) models. To solve this challenge, we propose LoLA (Long Horizon Latent Action Learning), a framework designed for robot manipulation that integrates long-term multi-view observations and robot proprioception to enable multi-step reasoning and action generation. We first employ Vision-Language Models to encode rich contextual features from historical sequences and multi-view observations. We further introduces a key module, State-Aware Latent Re-representation, which transforms visual inputs and language commands into actionable robot motion space. Unlike existing VLA approaches that merely concatenate robot proprioception (e.g., joint angles) with VL embeddings, this module leverages such robot states to explicitly ground VL representations in physical scale through a learnable "embodiment-anchored" latent space. We trained LoLA on diverse robotic pre-training datasets and conducted extensive evaluations on simulation benchmarks (SIMPLER and LIBERO), as well as two real-world tasks on Franka and Bi-Manual Aloha robots. Results show that LoLA significantly outperforms prior state-of-the-art methods (e.g., pi0), particularly in long-horizon manipulation tasks.

Xiaofan Wang, Xingyu Gao, Jianlong Fu, Zuolei Li, Dean Fortier, Galen Mullins, Andrey Kolobov, Baining Guo• 2025

Related benchmarks

TaskDatasetResultRank
Robot ManipulationLIBERO
Goal Achievement97.2
494
Robotic ManipulationSIMPLER Visual Matching WidowX robot
Put Spoon on Towel Score95.8
24
Robotic ManipulationBusyBox Bi-Manual Aloha (6 canonical tasks)
Average Success Rate46.7
3
Multi-step robotic manipulationFranka Robot Real-world scenarios (test)
Success Rate (T1-T3)5.9
3
Robotic ManipulationReal-world Single-Step Robotic Tasks
T1 Success Rate15.4
3
Showing 5 of 5 rows

Other info

Follow for update