Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Se\~norita-2M: A High-Quality Instruction-based Dataset for General Video Editing by Video Specialists

About

Recent advancements in video generation have spurred the development of video editing techniques, which can be divided into inversion-based and end-to-end methods. However, current video editing methods still suffer from several challenges. Inversion-based methods, though training-free and flexible, are time-consuming during inference, struggle with fine-grained editing instructions, and produce artifacts and jitter. On the other hand, end-to-end methods, which rely on edited video pairs for training, offer faster inference speeds but often produce poor editing results due to a lack of high-quality training video pairs. In this paper, to close the gap in end-to-end methods, we introduce Se\~norita-2M, a high-quality video editing dataset. Se\~norita-2M consists of approximately 2 millions of video editing pairs. It is built by crafting four high-quality, specialized video editing models, each crafted and trained by our team to achieve state-of-the-art editing results. We also propose a filtering pipeline to eliminate poorly edited video pairs. Furthermore, we explore common video editing architectures to identify the most effective structure based on current pre-trained generative model. Extensive experiments show that our dataset can help to yield remarkably high-quality video editing results. More details are available at https://senorita-2m-dataset.github.io.

Bojia Zi, Penghui Ruan, Marco Chen, Xianbiao Qi, Shaozhe Hao, Shihao Zhao, Youze Huang, Bin Liang, Rong Xiao, Kam-Fai Wong• 2025

Related benchmarks

TaskDatasetResultRank
Generation QualityPointBench
Success Rate (%)56.18
18
Video EditingEditVerseBench Appearance (test)
Pick Score19.69
12
Video EditingEditVerseBench 125 videos
CLIP Score98.9
11
Video EditingEditVerse latest (full)
Editing Quality6.45
11
Video EditingTGVE benchmark
Pick Score20.54
11
Video EditingEgoEditBench
VLM Score7.52
10
Mask-based video object insertionInternal (test)
MSE106.3
9
Point-based video object insertionDAVIS (test)
Acc Pos49.63
9
Video Object RemovalBridgeRemoval-Bench
CLIP-T0.292
7
Video Object RemovalDAVIS 2016
CLIP-T0.2618
7
Showing 10 of 14 rows

Other info

Follow for update