Self-Refining Video Sampling

About

Modern video generators still struggle with complex physical dynamics, often falling short of physical realism. Existing approaches address this using external verifiers or additional training on augmented data, which is computationally expensive and still limited in capturing fine-grained motion. In this work, we present self-refining video sampling, a simple method that uses a pre-trained video generator trained on large-scale datasets as its own self-refiner. By interpreting the generator as a denoising autoencoder, we enable iterative inner-loop refinement at inference time without any external verifier or additional training. We further introduce an uncertainty-aware refinement strategy that selectively refines regions based on self-consistency, which prevents artifacts caused by over-refinement. Experiments on state-of-the-art video generators demonstrate significant improvements in motion coherence and physics alignment, achieving over 70% human preference compared to the default sampler and guidance-based sampler.

Sangwon Jang, Taekyung Ki, Jaehyeong Jo, Saining Xie, Jaehong Yoon, Sung Ju Hwang• 2026

Related benchmarks

Task	Dataset	Result
Video Generation	VideoJAM-bench	Motion Score98.84	14
Robotics Image-to-Video Generation	PAI-Bench-G	Grasp Success89.6	8
Text-to-Video Generation	Dynamic-bench	VBench Motion Score98.41	5
Video Generation	VideoPhy2 hard and easy	PC (Gemini3-F)55.6	4
Video Generation	PhyWorldBench kinematics and interaction dynamics domain	PC Score (Gemini3-F)40	4

Showing 5 of 5 rows

Other info

GitHub

Follow for update

@wizwand_team Discord