LoLA: Long Horizon Latent Action Learning for General Robot Manipulation

About

The capability of performing long-horizon, language-guided robotic manipulation tasks critically relies on leveraging historical information and generating coherent action sequences. However, such capabilities are often overlooked by existing Vision-Language-Action (VLA) models. To solve this challenge, we propose LoLA (Long Horizon Latent Action Learning), a framework designed for robot manipulation that integrates long-term multi-view observations and robot proprioception to enable multi-step reasoning and action generation. We first employ Vision-Language Models to encode rich contextual features from historical sequences and multi-view observations. We further introduces a key module, State-Aware Latent Re-representation, which transforms visual inputs and language commands into actionable robot motion space. Unlike existing VLA approaches that merely concatenate robot proprioception (e.g., joint angles) with VL embeddings, this module leverages such robot states to explicitly ground VL representations in physical scale through a learnable "embodiment-anchored" latent space. We trained LoLA on diverse robotic pre-training datasets and conducted extensive evaluations on simulation benchmarks (SIMPLER and LIBERO), as well as two real-world tasks on Franka and Bi-Manual Aloha robots. Results show that LoLA significantly outperforms prior state-of-the-art methods (e.g., pi0), particularly in long-horizon manipulation tasks.

Xiaofan Wang, Xingyu Gao, Jianlong Fu, Zuolei Li, Dean Fortier, Galen Mullins, Andrey Kolobov, Baining Guo• 2025

Related benchmarks

Task	Dataset	Result
Robot Manipulation	LIBERO	Object Achievement99.6	957
Robotic Manipulation	SIMPLER Visual Matching WidowX robot	Put Spoon on Towel Score95.8	51
Robotic Manipulation	BusyBox Bi-Manual Aloha (6 canonical tasks)	Average Success Rate46.7	3
Multi-step robotic manipulation	Franka Robot Real-world scenarios (test)	Success Rate (T1-T3)5.9	3
Robotic Manipulation	Real-world Single-Step Robotic Tasks	T1 Success Rate15.4	3

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord