History-Guided Video Diffusion

About

Classifier-free guidance (CFG) is a key technique for improving conditional generation in diffusion models, enabling more accurate control while enhancing sample quality. It is natural to extend this technique to video diffusion, which generates video conditioned on a variable number of context frames, collectively referred to as history. However, we find two key challenges to guiding with variable-length history: architectures that only support fixed-size conditioning, and the empirical observation that CFG-style history dropout performs poorly. To address this, we propose the Diffusion Forcing Transformer (DFoT), a video diffusion architecture and theoretically grounded training objective that jointly enable conditioning on a flexible number of history frames. We then introduce History Guidance, a family of guidance methods uniquely enabled by DFoT. We show that its simplest form, vanilla history guidance, already significantly improves video generation quality and temporal consistency. A more advanced method, history guidance across time and frequency further enhances motion dynamics, enables compositional generalization to out-of-distribution history, and can stably roll out extremely long videos. Project website: https://boyuan.space/history-guidance

Kiwhan Song, Boyuan Chen, Max Simchowitz, Yilun Du, Russ Tedrake, Vincent Sitzmann• 2025

Related benchmarks

Task	Dataset	Result
Video Generation	Kinetics-600	FVD4.3	22
Video Generation	RealEstate10K (Re10K) (test)	PSNR22.395	16
Single-view Novel View Synthesis	DL3DV (Long-term (200th frame))	PSNR13.51	13
Single-view Novel View Synthesis	RealEstate10K Long-term, 200th frame 84 (test)	PSNR15.21	13
Single-view Novel View Synthesis	RealEstate10K Short-term, 50th frame 84 (test)	PSNR18.53	13
Single-view Novel View Synthesis	DL3DV Short-term (50th frame)	PSNR16.13	13
Object Navigation	AI2-THOR (Simulations)	Success Rate (SR)26.05	12
Time Series Forecasting	GreenEarthNet 1.0 (test)	PSNR (NDVI)15.71	9
Trajectory-based Video Generation	Random Loop Insertion	PSNR20.62	8
Object Navigation	Simulations	SR (%)26.05	8

Showing 10 of 47 rows

Other info

Follow for update

@wizwand_team Discord