Inference-time Physics Alignment of Video Generative Models with Latent World Models
About
State-of-the-art video generative models produce promising visual content yet often violate basic physics principles, limiting their utility. While some attribute this deficiency to insufficient physics understanding from pre-training, we find that the shortfall in physics plausibility also stems from suboptimal inference strategies. We therefore introduce WMReward and treat improving physics plausibility of video generation as an inference-time alignment problem. In particular, we leverage the strong physics prior of a latent world model (here, VJEPA-2) as a reward to search and steer multiple candidate denoising trajectories, enabling scaling test-time compute for better generation performance. Empirically, our approach substantially improves physics plausibility across image-conditioned, multiframe-conditioned, and text-conditioned generation settings, with validation from human preference study. Notably, in the ICCV 2025 Perception Test PhysicsIQ Challenge, we achieve a final score of 62.64%, winning first place and outperforming the previous state of the art by 7.42%. Our work demonstrates the viability of using latent world models to improve physics plausibility of video generation, beyond this specific instantiation or parameterization.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Video Generation | Physics-IQ | Phys. IQ Score62 | 63 | |
| Text-to-Video Generation | VideoPhy | -- | 41 | |
| Text-to-Video Generation | VideoPhy2 HARD | PC Score70 | 28 | |
| Physical Plausibility Evaluation | VideoPhy Hard 2 | PC Score52.2 | 20 | |
| Text-to-Video Generation | VideoPhy2 (Standard) | SA Score63.9 | 18 | |
| Physical and Semantic Video Evaluation | VideoPhy2 (Standard) | SA Score67.3 | 12 | |
| Physical Video Evaluation | Physics-IQ | Score35.5 | 12 | |
| Human Preference Evaluation | PhysicsIQ 1.0 (test) | Physics Plausibility Win Rate54.9 | 4 | |
| Human Preference Evaluation | VideoPhy 1.0 (test) | Physics Plausibility Win Rate59.3 | 4 |