Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k
About
Video generation models have achieved remarkable progress in the past year. The quality of AI video continues to improve, but at the cost of larger model size, increased data quantity, and greater demand for training compute. In this report, we present Open-Sora 2.0, a commercial-level video generation model trained for only $200k. With this model, we demonstrate that the cost of training a top-performing video generation model is highly controllable. We detail all techniques that contribute to this efficiency breakthrough, including data curation, model architecture, training strategy, and system optimization. According to human evaluation results and VBench scores, Open-Sora 2.0 is comparable to global leading video generation models including the open-source HunyuanVideo and the closed-source Runway Gen-3 Alpha. By making Open-Sora 2.0 fully open-source, we aim to democratize access to advanced video generation technology, fostering broader innovation and creativity in content creation. All resources are publicly available at: https://github.com/hpcaitech/Open-Sora.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Text-to-Video Generation | VBench | Quality Score82.1 | 168 | |
| Video Generation | VBench | Quality Score81.35 | 126 | |
| Video Generation | VBench | Total Score81.71 | 42 | |
| 3D Geometry | DL3DV (val) | P-map Error14.1 | 18 | |
| Instance Grouping | ScanNet (val) | T-SR4.3 | 18 | |
| Semantic Tagging | ScanNet (val) | APmid56.38 | 18 | |
| Text-to-Video Generation | VBench T2V 15 | Total Score83.6 | 17 | |
| Text-to-Video Generation | TV-Align (test) | Counting Alignment28.9 | 10 | |
| Image-to-Video Generation | VBench I2V beta (test) | Subject Consistency96.3 | 9 | |
| 3D Awareness | CO3D v2 | Point Error0.391 | 7 |