MAGI-1: Autoregressive Video Generation at Scale
About
We present MAGI-1, a world model that generates videos by autoregressively predicting a sequence of video chunks, defined as fixed-length segments of consecutive frames. Trained to denoise per-chunk noise that increases monotonically over time, MAGI-1 enables causal temporal modeling and naturally supports streaming generation. It achieves strong performance on image-to-video (I2V) tasks conditioned on text instructions, providing high temporal consistency and scalability, which are made possible by several algorithmic innovations and a dedicated infrastructure stack. MAGI-1 facilitates controllable generation via chunk-wise prompting and supports real-time, memory-efficient deployment by maintaining constant peak inference cost, regardless of video length. The largest variant of MAGI-1 comprises 24 billion parameters and supports context lengths of up to 4 million tokens, demonstrating the scalability and robustness of our approach. The code and models are available at https://github.com/SandAI-org/MAGI-1 and https://github.com/SandAI-org/MagiAttention. The product can be accessed at https://sand.ai.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Video Generation | VBench 5s | Total Score81.99 | 58 | |
| Video Generation | VBench (test) | Semantic Score67.74 | 48 | |
| Video Generation | short videos 81-frames 240 prompts | Total Score5.25 | 38 | |
| Video Generation | VBench 1.0 (test) | Image Quality0.6066 | 21 | |
| Long Video Generation | 120, 240, 720 and 1440-frames long videos | Total Score4.92 | 20 | |
| Video Generation | VBench short video (test) | Subject Consistency67.74 | 16 | |
| Video Generation | VBench | Total Score79.18 | 14 | |
| Short Video Generation | VBench-Long 60 seconds | Aesthetic Quality52.1 | 13 | |
| Short Video Generation | VBench 2024 | Total Score79.18 | 11 | |
| Short Video Generation | VBench official prompts | Total Score79.18 | 11 |