Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation

About

We introduce Genie Envisioner (GE), a unified world foundation platform for robotic manipulation that integrates policy learning, evaluation, and simulation within a single video-generative framework. At its core, GE-Base is a large-scale, instruction-conditioned video diffusion model that captures the spatial, temporal, and semantic dynamics of real-world robotic interactions in a structured latent space. Built upon this foundation, GE-Act maps latent representations to executable action trajectories through a lightweight, flow-matching decoder, enabling precise and generalizable policy inference across diverse embodiments with minimal supervision. To support scalable evaluation and training, GE-Sim serves as an action-conditioned neural simulator, producing high-fidelity rollouts for closed-loop policy development. The platform is further equipped with EWMBench, a standardized benchmark suite measuring visual fidelity, physical consistency, and instruction-action alignment. Together, these components establish Genie Envisioner as a scalable and practical foundation for instruction-driven, general-purpose embodied intelligence. All code, models, and benchmarks will be released publicly.

Yue Liao, Pengfei Zhou, Siyuan Huang, Donglin Yang, Shengcong Chen, Yuxin Jiang, Yue Hu, Jingbin Cai, Si Liu, Jianlan Luo, Liliang Chen, Shuicheng Yan, Maoqing Yao, Guanghui Ren• 2025

Related benchmarks

Task	Dataset	Result
Robot Manipulation	LIBERO	Object Achievement97.6	1025
Robotic Manipulation	LIBERO-Plus	Language Understanding Score77.4	414
Robotic Manipulation	LIBERO	Long-horizon Success Rate94.4	165
Robotic Manipulation	LIBERO v1 (test)	Average Success Rate96.5	118
Robotic Manipulation	LIBERO (test)	Object Success Rate97.6	85
Robot Manipulation	LIBERO-Plus Zero-shot	Camera Score60.7	59
Video Generation	WorldArena	Interaction Quality19.8	14
Action-Conditioned Video Generation	WorldArena	EWMScore68.26	10
Cucumber Peeling	Real-world visuo-tactile dataset	Success Rate10	10
Robot Policy Evaluation	LIBERO-Plus Spatial	Success Rate (Bg.)89.1	9

Showing 10 of 24 rows

Other info

Follow for update

@wizwand_team Discord