Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ASTRA: Let Arbitrary Subjects Transform in Video Editing

About

While existing video editing methods excel with single subjects, they struggle in dense, multi-subject scenes, frequently suffering from attention dilution and mask boundary entanglement that cause attribute leakage and temporal instability. To address this, we propose ASTRA, a training-free framework for seamless, arbitrary-subject video editing. Without requiring model fine-tuning, ASTRA precisely manipulates multiple designated subjects while strictly preserving non-target regions. It achieves this via two core components: a prompt-guided multimodal alignment module that generates robust conditions to mitigate attention dilution, and a prior-based mask retargeting module that produces temporally coherent mask sequences to resolve boundary entanglement. Functioning as a versatile plug-and-play module, ASTRA seamlessly integrates with diverse mask-driven video generators. Extensive experiments on our newly constructed benchmark, MSVBench, demonstrate that ASTRA consistently outperforms state-of-the-art methods. Code, models, and data are available at https://github.com/XWH-A/ASTRA.

Fei Shen, Weihao Xu, Rui Yan, Dong Zhang, Xiangbo Shu, Jinhui Tang, Maocheng Zhao• 2025

Related benchmarks

TaskDatasetResultRank
Video EditingMSVBench (test)
Warp Error1.85
10
Video EditingLOVEU-TGVE 2023
Warp-Err2.04
6
Video EditingVideo Editing Dataset
CLIP-T Score27.23
3
Showing 3 of 3 rows

Other info

Follow for update