Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Matrix-game 2.0: An open-source real-time and streaming interactive world model

About

Recent advances in interactive video generations have demonstrated diffusion model's potential as world models by capturing complex physical dynamics and interactive behaviors. However, existing interactive world models depend on bidirectional attention and lengthy inference steps, severely limiting real-time performance. Consequently, they are hard to simulate real-world dynamics, where outcomes must update instantaneously based on historical context and current actions. To address this, we present Matrix-Game 2.0, an interactive world model generates long videos on-the-fly via few-step auto-regressive diffusion. Our framework consists of three key components: (1) A scalable data production pipeline for Unreal Engine and GTA5 environments to effectively produce massive amounts (about 1200 hours) of video data with diverse interaction annotations; (2) An action injection module that enables frame-level mouse and keyboard inputs as interactive conditions; (3) A few-step distillation based on the casual architecture for real-time and streaming video generation. Matrix Game 2.0 can generate high-quality minute-level videos across diverse scenes at an ultra-fast speed of 25 FPS. We open-source our model weights and codebase to advance research in interactive world modeling.

Xianglong He, Chunli Peng, Zexiang Liu, Boyang Wang, Yifan Zhang, Qi Cui, Fei Kang, Biao Jiang, Mengyin An, Yangyang Ren, Baixin Xu, Hao-Xiang Guo, Kaixiong Gong, Size Wu, Wei Li, Xuchen Song, Yang Liu, Yangguang Li, Yahui Zhou• 2025

Related benchmarks

TaskDatasetResultRank
Video GenerationVBench
Quality Score72.15
102
Video GenerationVBench Long--
14
Action-controlled Video GenerationWorldPlay Long-term ≥ 250 frames (test)
PSNR9.57
9
Action-controlled Video GenerationWorldPlay Short-term 61 frames (test)
PSNR17.26
9
Controllable Video GenerationLongVGenBench (test)
Appearance Quality (A.Q.)55.24
8
Interactive World ModelingGeneral Game World Modeling
Resolution480
6
Video GenerationVBench-Long User Study
Video Quality23.2
6
Interactive World ModelingUser Study
Memory Score2.98
5
Action-Conditioned Video GenerationAstra-Bench
Rotation Error2.25
5
Controllable Video GenerationLongVGenBench Human Evaluation
VQ (Video Quality)2.12
5
Showing 10 of 16 rows

Other info

Follow for update