OmniWeaving: Towards Unified Video Generation with Free-form Composition and Reasoning

About

While proprietary systems such as Seedance-2.0 have achieved remarkable success in omni-capable video generation, open-source alternatives significantly lag behind. Most academic models remain heavily fragmented, and the few existing efforts toward unified video generation still struggle to seamlessly integrate diverse tasks within a single framework. To bridge this gap, we propose OmniWeaving, an omni-level video generation model featuring powerful multimodal composition and reasoning-informed capabilities. By leveraging a massive-scale pretraining dataset that encompasses diverse compositional and reasoning-augmented scenarios, OmniWeaving learns to temporally bind interleaved text, multi-image, and video inputs while acting as an intelligent agent to infer complex user intentions for sophisticated video creation. Furthermore, we introduce IntelligentVBench, the first comprehensive benchmark designed to rigorously assess next-level intelligent unified video generation. Extensive experiments demonstrate that OmniWeaving achieves SoTA performance among open-source unified models. The codes and model have already been publicly available. Project Page: https://omniweaving.github.io.

Kaihang Pan, Qi Tian, Jianwei Zhang, Weijie Kong, Jiangfeng Xiong, Yanxin Long, Shixue Zhang, Haiyi Qiu, Tan Wang, Zheqi Lv, Yue Wu, Liefeng Bo, Siliang Tang, Zhao Zhong• 2026

Related benchmarks

Task	Dataset	Result
Text-to-Video Generation	VBench	Quality Score84.37	209
Video Editing	OpenVE-Bench	Overall Score3.15	39
Instruction-Guided Video Editing	OpenVE-Bench	Overall Score2.92	29
Compositional Multi-Image-to-Video Generation	IntelligentVBench 1Subject with BKG	IF4.35	21
Compositional Multi-Image-to-Video Generation	IntelligentVBench 2Subjects with BKG	IF Score4.08	21
Compositional Multi-Image-to-Video Generation	IntelligentVBench 3Subjects with BKG	IF3.53	21
Implicit Image-to-Video (Implicit I2V)	IntelligentVBench	IF Score4.33	12
Controllable Video Generation	CogControlBench	AQ51.2	9
Reference-guided Video Editing	RefVIE-Bench (test)	Identity Score3.29	9
Interpolative Directed Image-to-Video (Interpolative DI2V)	IntelligentVBench	IF4.74	8

Showing 10 of 21 rows

Other info

Follow for update

@wizwand_team Discord