Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

VideoGPA: Distilling Geometry Priors for 3D-Consistent Video Generation

About

While recent video diffusion models (VDMs) produce visually impressive results, they fundamentally struggle to maintain 3D structural consistency, often resulting in object deformation or spatial drift. We hypothesize that these failures arise because standard denoising objectives lack explicit incentives for geometric coherence. To address this, we introduce VideoGPA (Video Geometric Preference Alignment), a data-efficient self-supervised framework that leverages a geometry foundation model to automatically derive dense preference signals that guide VDMs via Direct Preference Optimization (DPO). This approach effectively steers the generative distribution toward inherent 3D consistency without requiring human annotations. VideoGPA significantly enhances temporal stability, geometric plausibility, and motion coherence using minimal preference pairs, consistently outperforming state-of-the-art baselines in extensive experiments.

Hongyang Du, Junjie Ye, Xiaoyan Cong, Runhao Li, Jingcheng Ni, Aman Agarwal, Zeqi Zhou, Zekun Li, Randall Balestriero, Yue Wang• 2026

Related benchmarks

TaskDatasetResultRank
Video GenerationVBench 1.0 (test)--
21
Video GenerationStandard Video Evaluation Benchmark
Subject Consistency0.8511
10
Video GenerationDL3DV RealEstate10K Static-scene benchmark
Epipolar Consistency0.098
10
Video GenerationMiraData Dynamic-scene benchmark
VQ Score42.5
8
Text-to-Video GenerationDL3DV-10K 1K (test)
PSNR17.31
7
Geometric Consistency EvaluationCurated Prompt Set (OpenVid-1M & DL3DV) Dynamic Complex 1.0 (val)
MEt3R Score9.872
7
Geometric Consistency EvaluationCurated Prompt Set (OpenVid-1M & DL3DV) Static Simple 1.0 (val)
MEt3R6.11
7
Geometric Consistency EvaluationCurated Prompt Set (OpenVid-1M & DL3DV) Static Complex 1.0 (val)
MEt3R6.882
7
Geometric Consistency EvaluationCurated Prompt Set (OpenVid-1M & DL3DV) Dynamic Simple 1.0 (val)
MEt3R18.84
7
Video GenerationVBench standard caption set
Subject Consistency96.05
5
Showing 10 of 12 rows

Other info

Follow for update