Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation

About

To achieve real-time interactive video generation, current methods distill pretrained bidirectional video diffusion models into few-step autoregressive (AR) models, facing an architectural gap when full attention is replaced by causal attention. However, existing approaches do not bridge this gap theoretically. They initialize the AR student via ODE distillation, which requires frame-level injectivity, where each noisy frame must map to a unique clean frame under the PF-ODE of an AR teacher. Distilling an AR student from a bidirectional teacher violates this condition, preventing recovery of the teacher's flow map and instead inducing a conditional-expectation solution, which degrades performance. To address this issue, we propose Causal Forcing that uses an AR teacher for ODE initialization, thereby bridging the architectural gap. Empirical results show that our method outperforms all baselines across all metrics, surpassing the SOTA Self Forcing by 19.3\% in Dynamic Degree, 8.7\% in VisionReward, and 16.7\% in Instruction Following. Project page and the code: \href{https://thu-ml.github.io/CausalForcing.github.io/}{https://thu-ml.github.io/CausalForcing.github.io/}

Hongzhou Zhu, Min Zhao, Guande He, Hang Su, Chongxuan Li, Jun Zhu• 2026

Related benchmarks

TaskDatasetResultRank
Video GenerationVBench 5s
Total Score84.62
58
Video Generationshort videos 81-frames 240 prompts
Total Score5.4
38
Video GenerationVBench Long
Semantic Score76.91
23
Video GenerationVideoAlign
VQ Score3.97
20
Long Video Generation120, 240, 720 and 1440-frames long videos
Total Score3.86
20
Video GenerationVBench
Total Score84.04
14
Video GenerationVBench 20s generation
Throughput17
10
Interactive Video GenerationVBench
Total Score84.04
9
Long Video GenerationMulti-prompt Long Video 0-60s (test)
Quality Score84.12
8
Video GenerationVBench
Quality Score85.27
6
Showing 10 of 10 rows

Other info

GitHub

Follow for update