VideoGPA: Distilling Geometry Priors for 3D-Consistent Video Generation
About
While recent video diffusion models (VDMs) produce visually impressive results, they fundamentally struggle to maintain 3D structural consistency, often resulting in object deformation or spatial drift. We hypothesize that these failures arise because standard denoising objectives lack explicit incentives for geometric coherence. To address this, we introduce VideoGPA (Video Geometric Preference Alignment), a data-efficient self-supervised framework that leverages a geometry foundation model to automatically derive dense preference signals that guide VDMs via Direct Preference Optimization (DPO). This approach effectively steers the generative distribution toward inherent 3D consistency without requiring human annotations. VideoGPA significantly enhances temporal stability, geometric plausibility, and motion coherence using minimal preference pairs, consistently outperforming state-of-the-art baselines in extensive experiments.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Video Generation | VBench 1.0 (test) | -- | 21 | |
| Video Generation | Standard Video Evaluation Benchmark | Subject Consistency0.8511 | 10 | |
| Video Generation | DL3DV RealEstate10K Static-scene benchmark | Epipolar Consistency0.098 | 10 | |
| Video Generation | MiraData Dynamic-scene benchmark | VQ Score42.5 | 8 | |
| Text-to-Video Generation | DL3DV-10K 1K (test) | PSNR17.31 | 7 | |
| Geometric Consistency Evaluation | Curated Prompt Set (OpenVid-1M & DL3DV) Dynamic Complex 1.0 (val) | MEt3R Score9.872 | 7 | |
| Geometric Consistency Evaluation | Curated Prompt Set (OpenVid-1M & DL3DV) Static Simple 1.0 (val) | MEt3R6.11 | 7 | |
| Geometric Consistency Evaluation | Curated Prompt Set (OpenVid-1M & DL3DV) Static Complex 1.0 (val) | MEt3R6.882 | 7 | |
| Geometric Consistency Evaluation | Curated Prompt Set (OpenVid-1M & DL3DV) Dynamic Simple 1.0 (val) | MEt3R18.84 | 7 | |
| Video Generation | VBench standard caption set | Subject Consistency96.05 | 5 |