Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise

About

Generative modeling aims to transform random noise into structured outputs. In this work, we enhance video diffusion models by allowing motion control via structured latent noise sampling. This is achieved by just a change in data: we pre-process training videos to yield structured noise. Consequently, our method is agnostic to diffusion model design, requiring no changes to model architectures or training pipelines. Specifically, we propose a novel noise warping algorithm, fast enough to run in real time, that replaces random temporal Gaussianity with correlated warped noise derived from optical flow fields, while preserving the spatial Gaussianity. The efficiency of our algorithm enables us to fine-tune modern video diffusion base models using warped noise with minimal overhead, and provide a one-stop solution for a wide range of user-friendly motion control: local object motion control, global camera movement control, and motion transfer. The harmonization between temporal coherence and spatial Gaussianity in our warped noise leads to effective motion control while maintaining per-frame pixel quality. Extensive experiments and user studies demonstrate the advantages of our method, making it a robust and scalable approach for controlling motion in video diffusion models. Video results are available on our webpage: https://eyeline-labs.github.io/Go-with-the-Flow. Source code and model checkpoints are available on GitHub: https://github.com/Eyeline-Labs/Go-with-the-Flow.

Ryan Burgert, Yuancheng Xu, Wenqi Xian, Oliver Pilarski, Pascal Clausen, Mingming He, Li Ma, Yitong Deng, Lingxiao Li, Mohsen Mousavi, Michael Ryoo, Paul Debevec, Ning Yu• 2025

Related benchmarks

Task	Dataset	Result
HOI Video Generation	HOIGen-1M 1.0 (test)	CLIPSIM0.3035	14
Fashion Video Generation	UBC Fashion (test)	LPIPS0.241	10
Controllable Video Generation	LongVGenBench (test)	Appearance Quality (A.Q.)53.59	8
Motion Transfer	DAVIS (val)	PSNR15.62	8
Motion Transfer	Sora Demo Subset	PSNR14.59	8
Image-to-Video Generation	VBench I2V Sora subset	Subject Consistency (I2V)95.7	8
Camera-controlled Video Generation	DL3DV 160 samples subset 10K (val)	CLIP Consistency0.984	7
First-frame-guided video editing	I2V-Edit-Benchmark	CLIP Score0.93	7
Multi-object motion transfer	Custom multi-object motion transfer 200 sequences (test)	AC (Automatic)77.4	6
Video Editing	User Study	Editing Consistency Score72.5	6

Showing 10 of 22 rows

Other info

Code

Follow for update

@wizwand_team Discord