Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k

About

Video generation models have achieved remarkable progress in the past year. The quality of AI video continues to improve, but at the cost of larger model size, increased data quantity, and greater demand for training compute. In this report, we present Open-Sora 2.0, a commercial-level video generation model trained for only $200k. With this model, we demonstrate that the cost of training a top-performing video generation model is highly controllable. We detail all techniques that contribute to this efficiency breakthrough, including data curation, model architecture, training strategy, and system optimization. According to human evaluation results and VBench scores, Open-Sora 2.0 is comparable to global leading video generation models including the open-source HunyuanVideo and the closed-source Runway Gen-3 Alpha. By making Open-Sora 2.0 fully open-source, we aim to democratize access to advanced video generation technology, fostering broader innovation and creativity in content creation. All resources are publicly available at: https://github.com/hpcaitech/Open-Sora.

Zangwei Zheng, Xiangyu Peng, Yuxuan Lou, Chenhui Shen, Tom Young, Xinying Guo, Binluo Wang, Hang Xu, Hongxin Liu, Mingyan Jiang, Wenjun Li, Yuhui Wang, Anbang Ye, Gang Ren, Qianran Ma, Wanying Liang, Xiang Lian, Xiwen Wu, Yuting Zhong, Zhuangyan Li, Chaoyu Gong, Guojun Lei, Leijun Cheng, Limin Zhang, Minghao Li, Ruijie Zhang, Silan Hu, Shijie Huang, Xiaokang Wang, Yuanheng Zhao, Yuqi Wang, Ziang Wei, Yang You• 2025

Related benchmarks

Task	Dataset	Result
Text-to-Video Generation	VBench	Quality Score82.1	209
Video Generation	VBench	Quality Score81.35	126
Video Generation	VBench	Total Score81.71	48
Image-to-Video Generation	VBench	Motion Smoothness0.9846	46
3D Geometry	DL3DV (val)	P-map Error14.1	18
Instance Grouping	ScanNet (val)	T-SR4.3	18
Semantic Tagging	ScanNet (val)	APmid56.38	18
Text-to-Video Generation	VBench T2V 15	Total Score83.6	17
Text-to-Video Generation	TV-Align (test)	Counting Alignment28.9	10
Image-to-Video Generation	VBench I2V beta (test)	Subject Consistency96.3	9

Showing 10 of 14 rows

Other info

Follow for update

@wizwand_team Discord