Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory

About

With the advancement of interactive video generation, diffusion models have increasingly demonstrated their potential as world models. However, existing approaches still struggle to simultaneously achieve memory-enabled long-term temporal consistency and high-resolution real-time generation, limiting their applicability in real-world scenarios. To address this, we present Matrix-Game 3.0, a memory-augmented interactive world model designed for 720p real-time longform video generation. Building upon Matrix-Game 2.0, we introduce systematic improvements across data, model, and inference. First, we develop an upgraded industrial-scale infinite data engine that integrates Unreal Engine-based synthetic data, large-scale automated collection from AAA games, and real-world video augmentation to produce high-quality Video-Pose-Action-Prompt quadruplet data at scale. Second, we propose a training framework for long-horizon consistency: by modeling prediction residuals and re-injecting imperfect generated frames during training, the base model learns self-correction; meanwhile, camera-aware memory retrieval and injection enable the base model to achieve long horizon spatiotemporal consistency. Third, we design a multi-segment autoregressive distillation strategy based on Distribution Matching Distillation (DMD), combined with model quantization and VAE decoder pruning, to achieve efficient real-time inference. Experimental results show that Matrix-Game 3.0 achieves up to 40 FPS real-time generation at 720p resolution with a 5B model, while maintaining stable memory consistency over minute-long sequences. Scaling up to a 2x14B model further improves generation quality, dynamics, and generalization. Our approach provides a practical pathway toward industrial-scale deployable world models.

Zile Wang, Zexiang Liu, Jiaxing Li, Kaichen Huang, Baixin Xu, Fei Kang, Mengyin An, Peiyu Wang, Biao Jiang, Yichen Wei, Yidan Xietian, Jiangbo Pei, Liang Hu, Boyi Jiang, Hua Xue, Zidong Wang, Haofeng Sun, Wei Li, Wanli Ouyang, Xianglong He, Yang Liu, Yangguang Li, Yahui Zhou• 2026

Related benchmarks

TaskDatasetResultRank
Long-horizon Video GenerationnuScenes (val)
FID35.69
9
World Modeling1-min world modeling benchmark Simple-Trajectory
R (Trajectory Fidelity)12.96
6
World Modeling1-min World Modeling Benchmark Hard-Trajectory
R Score18.79
6
World Modeling60-second benchmark (Hard-Trajectory)
PSNR12.17
6
World Modeling60-second benchmark (Simple-Trajectory)
PSNR12.29
6
Game World ModelingSF2
Gemini Score3
5
Game World ModelingSF3
Gemini Score32.5
5
Action ControllabilityCrossFPS (unseen scenes)
Fire Controllability0.00e+0
4
Controllable Video GenerationCrossFPS (test)
JEPA0.366
4
Showing 9 of 9 rows

Other info

Follow for update