Foresight Diffusion: Improving Sampling Consistency in Predictive Diffusion Models

About

Diffusion and flow-based models have enabled significant progress in generation tasks across various modalities and have recently found applications in predictive learning. However, unlike typical generation tasks that encourage sample diversity, predictive learning entails different sources of stochasticity and requires sampling consistency aligned with the ground-truth trajectory, which is a limitation we empirically observe in diffusion models. We argue that a key bottleneck in learning sampling-consistent predictive diffusion models lies in suboptimal predictive ability, which we attribute to the entanglement of condition understanding and target denoising within shared architectures and co-training schemes. To address this, we propose Foresight Diffusion (ForeDiff), a framework for predictive diffusion models that improves sampling consistency by decoupling condition understanding from target denoising. ForeDiff incorporates a separate deterministic predictive stream to process conditioning inputs independently of the denoising stream, and further leverages a pretrained predictor to extract informative representations that guide generation. Extensive experiments on robot video prediction and scientific spatiotemporal forecasting show that ForeDiff improves both predictive accuracy and sampling consistency over strong baselines, offering a promising direction for predictive diffusion models.

Yu Zhang, Xingzhuo Guo, Haoran Xu, Jialong Wu, Mingsheng Long• 2025

Related benchmarks

Task	Dataset	Result
Video Prediction	RoboNet	FVD51.5	13
Scientific spatiotemporal forecasting	HeterNS	L2 Error0.19	3
Video Prediction Calibration Evaluation	RoboNet	CRPS0.0173	2
Video Prediction Calibration Evaluation	RT-1	CRPS0.0128	2

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord