Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Beyond Rigid: Benchmarking Non-Rigid Video Editing

About

Despite the remarkable progress in text-driven video editing, generating coherent non-rigid deformations remains a critical challenge, often plagued by physical distortion and temporal flicker. To bridge this gap, we propose NRVBench, the first dedicated and comprehensive benchmark designed to evaluate non-rigid video editing. First, we curate a high-quality dataset consisting of 180 non-rigid motion videos from six physics-based categories, equipped with 2,340 fine-grained task instructions and 360 multiple-choice questions. Second, we propose NRVE-Acc, a novel evaluation metric based on Vision-Language Models that can rigorously assess physical compliance, temporal consistency, and instruction alignment, overcoming the limitations of general metrics in capturing complex dynamics. Third, we introduce a training-free baseline, VM-Edit, which utilizes a dual-region denoising mechanism to achieve structure-aware control, balancing structural preservation and dynamic deformation. Extensive experiments demonstrate that while current methods have shortcomings in maintaining physical plausibility, our method achieves excellent performance across both standard and proposed metrics. We believe the benchmark could serve as a standard testing platform for advancing physics-aware video editing.

Bingzheng Qu, Kehai Chen, Xuefeng Bai, Jun Yu, Min Zhang• 2026

Related benchmarks

TaskDatasetResultRank
Video EditingNRVBench V1 (full)
Distortion (x10^3)8.69
14
Video EditingNRVBench V0 (pilot)
Distortion (x1000)16.45
7
Video EditingDataset 15 × 3 × 150 frames V0
Distance (Scaled by 1e3)16.45
7
Video EditingV0
Sphy71.89
6
Video EditingNRVBench
S_phy71.44
6
Video EditingV1
Sphy71.44
6
Showing 6 of 6 rows

Other info

Follow for update