Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion

About

We introduce Self Forcing, a novel training paradigm for autoregressive video diffusion models. It addresses the longstanding issue of exposure bias, where models trained on ground-truth context must generate sequences conditioned on their own imperfect outputs during inference. Unlike prior methods that denoise future frames based on ground-truth context frames, Self Forcing conditions each frame's generation on previously self-generated outputs by performing autoregressive rollout with key-value (KV) caching during training. This strategy enables supervision through a holistic loss at the video level that directly evaluates the quality of the entire generated sequence, rather than relying solely on traditional frame-wise objectives. To ensure training efficiency, we employ a few-step diffusion model along with a stochastic gradient truncation strategy, effectively balancing computational cost and performance. We further introduce a rolling KV cache mechanism that enables efficient autoregressive video extrapolation. Extensive experiments demonstrate that our approach achieves real-time streaming video generation with sub-second latency on a single GPU, while matching or even surpassing the generation quality of significantly slower and non-causal diffusion models. Project website: http://self-forcing.github.io/

Xun Huang, Zhengqi Li, Guande He, Mingyuan Zhou, Eli Shechtman• 2025

Related benchmarks

Task	Dataset	Result
Text-to-Video Generation	VBench	Quality Score85.25	168
Video Generation	VBench	--	126
Long Video Generation	VBench-Long 60 seconds	Subject Consistency97.4	74
Video Generation	VBench 5s	Quality Score85.07	73
Video Generation	VBench (test)	Semantic Score81.28	66
Video Generation	VBench Long	Motion Smoothness98.27	49
Video Generation	short videos 81-frames 240 prompts	Total Score5.75	38
Text-to-Video Generation	VBench (test)	Total Score83.46	37
Long Video Generation	VBench	Overall Score97.14	35
Video Generation	VideoAlign	VQ Score3.8	26

Showing 10 of 122 rows

...

Other info

Follow for update

@wizwand_team Discord