LIVEditor-14B: Lightning Unified Video Editing via In-Context Sparse Attention
About
Video editing has evolved toward In-Context Learning (ICL) paradigms, yet the resulting quadratic attention costs create a critical computational bottleneck. In this work, we propose In-context Sparse Attention (ISA), the first near-lossless empirical sparse framework tailored for ICL video editing. Our design is grounded in two key insights: first, context tokens exhibit significantly lower saliency than source tokens; second, we theoretically prove and empirically validate that Query sharpness correlates with approximation error. Motivated by these findings, ISA implements an efficient pre-selection strategy to prune redundant context, followed by a dynamic query grouping mechanism that routes high-error queries to full attention and low-error ones to a computationally efficient 0-th order Taylor sparse attention. Furthermore, we build \textbf{\texttt{LIVEditor-14B}} , a novel lightning video editing model via ISA and a proposed video-editing data pipeline that curated a 1.7M high-quality dataset. Extensive experiments demonstrate that LIVEditor-14B achieves a $\sim$60% reduction in attention-module latency while surpassing state-of-the-art methods across EditVerseBench, IVE-Bench, and VIE-Bench, delivering near-lossless acceleration without compromising visual fidelity.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Video Editing | VIE-Bench | Instruction Following5.55 | 18 | |
| Video Editing | IVE-Bench | Total Score67 | 10 | |
| Video Editing | EditVerseBench (test) | Quality Score7.89 | 8 | |
| Video Editing | VIE-Bench Swap | Follow Score7.91 | 6 | |
| Video Editing | VIE-Bench Add | Following Score8.87 | 5 | |
| Video Editing | VIE-Bench Style | Instruction Following Score8.06 | 4 | |
| Video Editing | VIE-Bench Hybrid | Follow Score8.1 | 4 |