OpenVE-3M: A Large-Scale High-Quality Dataset for Instruction-Guided Video Editing
About
The quality and diversity of instruction-based image editing datasets are continuously increasing, yet large-scale, high-quality datasets for instruction-based video editing remain scarce. To address this gap, we introduce OpenVE-3M, an open-source, large-scale, and high-quality dataset for instruction-based video editing. It comprises two primary categories: spatially-aligned edits (Global Style, Background Change, Local Change, Local Remove, Local Add, and Subtitles Edit) and non-spatially-aligned edits (Camera Multi-Shot Edit and Creative Edit). All edit types are generated via a meticulously designed data pipeline with rigorous quality filtering. OpenVE-3M surpasses existing open-source datasets in terms of scale, diversity of edit types, instruction length, and overall quality. Furthermore, to address the lack of a unified benchmark in the field, we construct OpenVE-Bench, containing 431 video-edit pairs that cover a diverse range of editing tasks with three key metrics highly aligned with human judgment. We present OpenVE-Edit, a 5B model trained on our dataset that demonstrates remarkable efficiency and effectiveness by setting a new state-of-the-art on OpenVE-Bench, outperforming all prior open-source models including a 14B baseline. Project page is at https://lewandofskee.github.io/projects/OpenVE.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Video Editing | OpenVE-Bench (test) | Overall Score3.89 | 16 | |
| Instruction-Guided Video Editing | OpenVE-Bench 1.0 (full) | Overall Quality2.49 | 16 | |
| Instruction-Guided Video Editing | OpenVE-Bench | Overall Score2.49 | 8 | |
| Video Editing | OpenVE-Bench 1.0 (test) | Overall Score3.89 | 8 | |
| Video Editing Evaluation | OpenVE-Bench Video Paris 1.0 | Overall Score3.54 | 8 |