Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Omni-Video: Democratizing Unified Video Understanding and Generation

About

Notable breakthroughs in unified understanding and generation modeling have led to remarkable advancements in image understanding, reasoning, production and editing, yet current foundational models predominantly focus on processing images, creating a gap in the development of unified models for video understanding and generation. This report presents Omni-Video, an efficient and effective unified framework for video understanding, generation, as well as instruction-based editing. Our key insight is to teach existing multimodal large language models (MLLMs) to produce continuous visual clues that are used as the input of diffusion decoders, which produce high-quality videos conditioned on these visual clues. To fully unlock the potential of our system for unified video modeling, we integrate several technical improvements: 1) a lightweight architectural design that respectively attaches a vision head on the top of MLLMs and a adapter before the input of diffusion decoders, the former produce visual tokens for the latter, which adapts these visual tokens to the conditional space of diffusion decoders; and 2) an efficient multi-stage training scheme that facilitates a fast connection between MLLMs and diffusion decoders with limited data and computational resources. We empirically demonstrate that our model exhibits satisfactory generalization abilities across video generation, editing and understanding tasks.

Zhiyu Tan, Hao Yang, Luozheng Qin, Jia Gong, Mengping Yang, Hao Li• 2025

Related benchmarks

TaskDatasetResultRank
Text-to-Image GenerationGenEval
Overall Score75
391
Video EditingOpenVE-Bench
Overall Score1.19
22
Video GenerationVideo Generation
Sampling Time (s)216
21
Instruction-Guided Video EditingOpenVE-Bench
Overall Score3.66
17
Video EditingOpenVE-Bench (test)
Overall Score3.66
16
Instruction-Guided Video EditingOpenVE-Bench 1.0 (full)
Overall Quality1.31
16
Instruction-only Video Editing (Add)VIE-Bench
Instruction Following5.699
15
Video EditingVIE-Bench
Instruction Following6.004
11
Video EditingVIE-Bench Swap Change
Instruction Following Score4.733
10
Video Instruction EditingInsEdit-Bench
Overall Score4.13
9
Showing 10 of 25 rows

Other info

Follow for update