Matrix-Game: Interactive World Foundation Model
About
We introduce Matrix-Game, an interactive world foundation model for controllable game world generation. Matrix-Game is trained using a two-stage pipeline that first performs large-scale unlabeled pretraining for environment understanding, followed by action-labeled training for interactive video generation. To support this, we curate Matrix-Game-MC, a comprehensive Minecraft dataset comprising over 2,700 hours of unlabeled gameplay video clips and over 1,000 hours of high-quality labeled clips with fine-grained keyboard and mouse action annotations. Our model adopts a controllable image-to-world generation paradigm, conditioned on a reference image, motion context, and user actions. With over 17 billion parameters, Matrix-Game enables precise control over character actions and camera movements, while maintaining high visual quality and temporal coherence. To evaluate performance, we develop GameWorld Score, a unified benchmark measuring visual quality, temporal quality, action controllability, and physical rule understanding for Minecraft world generation. Extensive experiments show that Matrix-Game consistently outperforms prior open-source Minecraft world models (including Oasis and MineWorld) across all metrics, with particularly strong gains in controllability and physical consistency. Double-blind human evaluations further confirm the superiority of Matrix-Game, highlighting its ability to generate perceptually realistic and precisely controllable videos across diverse game scenarios. To facilitate future research on interactive image-to-world generation, we will open-source the Matrix-Game model weights and the GameWorld Score benchmark at https://github.com/SkyworkAI/Matrix-Game.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Visual generation | 2D trajectory dataset | LPIPS0.589 | 16 | |
| View Synthesis | ViewBench 30 deg | PSNR14.27 | 6 | |
| View Synthesis | ViewBench 75 deg | PSNR13.46 | 6 | |
| Image-to-Video Generation | Yume-Bench | Image Fidelity (IF)27.1 | 4 | |
| Visual Navigation | 2D Navigation | ATE14.75 | 4 | |
| Novel View Synthesis | ViewBench 45° rotation magnitude | PSNR13.55 | 3 | |
| Novel View Synthesis | ViewBench 90° rotation | PSNR12.41 | 3 | |
| Novel View Synthesis | ViewBench 180° rotation magnitude | PSNR12.74 | 3 |