One-Forcing: Towards Stable One-Step Autoregressive Video Generation

About

Recent advances have substantially improved real-time interactive video generation in the autoregressive regime. However, most existing few-step autoregressive video generation methods, often distilled from a corresponding many-step teacher, default to a 4-step sampling configuration, which still incurs considerable latency during deployment and suffers from severe quality degradation when the number of sampling steps is further reduced, particularly in the one-step setting. Trajectory-style consistency distillation methods often produce videos with weak dynamics, while DMD-based approaches, such as Self-Forcing, tend to yield blurry frames. To address this challenge, we propose One-Forcing, a simple yet effective approach which augments the DMD objective with an auxiliary GAN loss for high-quality and efficient one-step video generation. Experiments on VBench show that One-Forcing achieves a total score of 83.76, establishing state-of-the-art performance among one-step causal video generation methods and remaining competitive with strong many-step approaches. We further demonstrate that one-step framewise autoregressive generation can be achieved stably with merely one-third of the training cost of the chunkwise model, a setting that prior methods have failed to achieve successfully.

Jiaqi Feng, Justin Cui, Yuanhao Ban, Cho-Jui Hsieh• 2026

Related benchmarks

Task	Dataset	Result	Rank
Text-to-Video Generation	VBench (test)	Total Score83.76		41
Human Preference	VBench 50 prompts	Ours Wins139		3

Showing 2 of 2 rows

Other info

GitHub

Follow for update

@wizwand_team Discord