Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Efficient Autoregressive Video Diffusion with Dummy Head

About

The autoregressive video diffusion model has recently gained considerable research interest due to its causal modeling and iterative denoising. In this work, we identify that the multi-head self-attention in these models under-utilizes historical frames: approximately 25% heads attend almost exclusively to the current frame, and discarding their KV caches incurs only minor performance degradation. Building upon this, we propose Dummy Forcing, a simple yet effective method to control context accessibility across different heads. Specifically, the proposed heterogeneous memory allocation reduces head-wise context redundancy, accompanied by dynamic head programming to adaptively classify head types. Moreover, we develop a context packing technique to achieve more aggressive cache compression. Without additional training, our Dummy Forcing delivers up to 2.0x speedup over the baseline, supporting video generation at 24.3 FPS with less than 0.5% quality drop. Project page is available at https://csguoh.github.io/project/DummyForcing/.

Hang Guo, Zhaoyang Jia, Jiahao Li, Bin Li, Yuanhao Cai, Jiangshan Wang, Yawei Li, Yan Lu• 2026

Related benchmarks

TaskDatasetResultRank
Long Video GenerationVBench-Long 60 seconds
Subject Consistency97.95
74
Video GenerationVBench 5s
Quality Score84.63
73
Video Generationshort videos 81-frames 240 prompts
Total Score5.45
38
Long Video GenerationVBenchLong 30-second
Dynamic Degree50.47
22
Long Video Generation120, 240, 720 and 1440-frames long videos
Total Score6.14
20
Video GenerationVBench-Long 30s videos
FPS24.3
8
Video GenerationVBench 5-second video generation
Chunk Discrepancy2.1
7
Video GenerationVBench Self Forcing 5-second
Chunk Discrimination Score2.6
5
Interactive Video GenerationVBench-Long 60-second
FPS25.74
3
Showing 9 of 9 rows

Other info

GitHub

Follow for update