Being-H0.7: A Latent World-Action Model from Egocentric Videos

About

Visual-Language-Action models (VLAs) have advanced generalist robot control by mapping multimodal observations and language instructions directly to actions, but sparse action supervision often encourages shortcut mappings rather than representations of dynamics, contact, and task progress. Recent world-action models introduce future prediction through video rollouts, yet pixel-space prediction is a costly and indirect substrate for control, as it may model visual details irrelevant to action generation and introduces substantial training or inference overhead. We present Being-H0.7, a latent world-action model that brings future-aware reasoning into VLA-style policies without generating future frames. Being-H0.7 inserts learnable latent queries between perception and action as a compact reasoning interface, and trains them with a future-informed dual-branch design: a deployable prior branch infers latent states from the current context, while a training-only posterior branch replaces the queries with embeddings from future observations. Jointly aligning the two branches at the latent reasoning space leads the prior branch to reason future-aware, action-useful structure from current observations alone. At inference, Being-H0.7 discards the posterior branch and performs no visual rollout. Experiments across six simulation benchmarks and diverse real-world tasks show that Being-H0.7 achieves state-of-the-art or comparable performance, combining the predictive benefits of world models with the efficiency and deployability of direct VLA policies.

Hao Luo, Wanpeng Zhang, Yicheng Feng, Sipeng Zheng, Haiweng Xu, Chaoyi Xu, Ziheng Xi, Yuhui Fu, Zongqing Lu• 2026

Related benchmarks

Task	Dataset	Result
Robotic Manipulation	LIBERO-Plus	--	414
Robotic Manipulation	LIBERO v1 (test)	Average Success Rate99.2	118
Robotic Manipulation	RoboCasa	--	68
Robotic Manipulation	LIBERO-Plus (test)	Lighting Robustness Score97.8	52
Bimanual Manipulation	RoboTwin Clean setting 2.0	Success Rate90.2	36
Robotic Manipulation	RoboTwin 50-task (Seen Tasks)	Average Success Rate89.9	27
Bimanual Manipulation	RoboTwin 2.0 (random)	Success Rate89.6	26
Bimanual Manipulation	RoboTwin 2.0	Success Rate89.8	25
Robotic Manipulation	RoboTwin Hard 2.0	Overall Success Rate89.6	21
Robotic Manipulation	RoboTwin Easy	Average Success Rate (AVG)90.2	18

Showing 10 of 22 rows

Other info

Follow for update

@wizwand_team Discord