Causal World Modeling for Robot Control

About

This work highlights that video world modeling, alongside vision-language pre-training, establishes a fresh and independent foundation for robot learning. Intuitively, video world models provide the ability to imagine the near future by understanding the causality between actions and visual dynamics. Inspired by this, we introduce LingBot-VA, an autoregressive diffusion framework that learns frame prediction and policy execution simultaneously. Our model features three carefully crafted designs: (1) a shared latent space, integrating vision and action tokens, driven by a Mixture-of-Transformers (MoT) architecture, (2) a closed-loop rollout mechanism, allowing for ongoing acquisition of environmental feedback with ground-truth observations, (3) an asynchronous inference pipeline, parallelizing action prediction and motor execution to support efficient control. We evaluate our model on both simulation benchmarks and real-world scenarios, where it shows significant promise in long-horizon manipulation, data efficiency in post-training, and strong generalizability to novel configurations. The code and model are made publicly available to facilitate the community.

Lin Li, Qihang Zhang, Yiming Luo, Shuai Yang, Ruilin Wang, Fei Han, Mingrui Yu, Zelin Gao, Nan Xue, Xing Zhu, Yujun Shen, Yinghao Xu• 2026

Related benchmarks

Task	Dataset	Result
Robot Manipulation	LIBERO	Object Achievement99.6	1025
Robotic Manipulation	LIBERO	Spatial Success Rate98.5	570
Robot Manipulation	LIBERO (test)	Average Success Rate98.5	237
Robot Manipulation	LIBERO	Spatial Success Rate98.5	223
Robotic Manipulation	LIBERO	Long-horizon Success Rate98.5	165
Robotic Manipulation	LIBERO v1 (test)	Average Success Rate98.5	118
Robotic Manipulation	RoboTwin 2.0	Average Success Rate92.2	115
Robotic Manipulation	LIBERO	Long Success Rate98.5	108
Robot Manipulation	RoboTwin Randomized 2.0	Overall Success Rate91.5	100
Robot Manipulation	LIBERO	Spatial Success98.5	90

Showing 10 of 62 rows

Other info

Follow for update

@wizwand_team Discord