Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Matrix-Game: Interactive World Foundation Model

About

We introduce Matrix-Game, an interactive world foundation model for controllable game world generation. Matrix-Game is trained using a two-stage pipeline that first performs large-scale unlabeled pretraining for environment understanding, followed by action-labeled training for interactive video generation. To support this, we curate Matrix-Game-MC, a comprehensive Minecraft dataset comprising over 2,700 hours of unlabeled gameplay video clips and over 1,000 hours of high-quality labeled clips with fine-grained keyboard and mouse action annotations. Our model adopts a controllable image-to-world generation paradigm, conditioned on a reference image, motion context, and user actions. With over 17 billion parameters, Matrix-Game enables precise control over character actions and camera movements, while maintaining high visual quality and temporal coherence. To evaluate performance, we develop GameWorld Score, a unified benchmark measuring visual quality, temporal quality, action controllability, and physical rule understanding for Minecraft world generation. Extensive experiments show that Matrix-Game consistently outperforms prior open-source Minecraft world models (including Oasis and MineWorld) across all metrics, with particularly strong gains in controllability and physical consistency. Double-blind human evaluations further confirm the superiority of Matrix-Game, highlighting its ability to generate perceptually realistic and precisely controllable videos across diverse game scenarios. To facilitate future research on interactive image-to-world generation, we will open-source the Matrix-Game model weights and the GameWorld Score benchmark at https://github.com/SkyworkAI/Matrix-Game.

Yifan Zhang, Chunli Peng, Boyang Wang, Puyi Wang, Qingcheng Zhu, Fei Kang, Biao Jiang, Zedong Gao, Eric Li, Yang Liu, Yahui Zhou• 2025

Related benchmarks

TaskDatasetResultRank
Visual generation2D trajectory dataset
LPIPS0.589
16
View SynthesisViewBench 30 deg
PSNR14.27
6
View SynthesisViewBench 75 deg
PSNR13.46
6
Image-to-Video GenerationYume-Bench
Image Fidelity (IF)27.1
4
Visual Navigation2D Navigation
ATE14.75
4
Novel View SynthesisViewBench 45° rotation magnitude
PSNR13.55
3
Novel View SynthesisViewBench 90° rotation
PSNR12.41
3
Novel View SynthesisViewBench 180° rotation magnitude
PSNR12.74
3
Showing 8 of 8 rows

Other info

Follow for update