Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Beyond Rigid: Benchmarking Non-Rigid Video Editing

About

As video generation models are increasingly expected to manipulate physical dynamics, there is a growing need to move evaluation beyond appearance fidelity and semantic alignment. Non-rigid video editing offers a uniquely revealing testbed, where distinct materials impose distinct physical constraints. In this paper, we introduce NRVBench, a diagnostic benchmark for non-rigid video editing, where the task is to modify deformable motion while preserving irrelevant regions and maintaining material-specific plausibility. NRVBench contains 180 curated videos across six physics-grounded categories, 2,340 fine-grained editing instructions, 360 multiple-choice questions, and pixel-accurate masks. We further propose NRVE-Acc, a structured VLM-based protocol that decomposes editing success into instruction following, material-aware deformation plausibility, and temporal coherence with motion cues. Experiments on representative inference-time video editing methods reveal a clear mismatch between conventional metrics and physics-aware perceptual editing success: methods that preserve appearance or achieve strong global alignment may still fail under non-rigid dynamics. We additionally introduce VM-Edit, a simple region-conditioned editing baseline that frees the foreground while locking the background, exposing the stability--plasticity trade-off.

Bingzheng Qu, Xuefeng Bai, Kehai Chen, Min Zhang• 2026

Related benchmarks

TaskDatasetResultRank
Video EditingNRVBench V1 (full)
Distortion (x10^3)8.69
14
Video EditingNRVBench V0 (pilot)
Distortion (x1000)16.45
7
Video EditingDataset 15 × 3 × 150 frames V0
Distance (Scaled by 1e3)16.45
7
Video EditingV0
Sphy71.89
6
Video EditingNRVBench
S_phy71.44
6
Video EditingV1
Sphy71.44
6
Showing 6 of 6 rows

Other info

Follow for update