Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction

About

We introduce Geo4D, a method to repurpose video diffusion models for monocular 3D reconstruction of dynamic scenes. By leveraging the strong dynamic priors captured by large-scale pre-trained video models, Geo4D can be trained using only synthetic data while generalizing well to real data in a zero-shot manner. Geo4D predicts several complementary geometric modalities, namely point, disparity, and ray maps. We propose a new multi-modal alignment algorithm to align and fuse these modalities, as well as a sliding window approach at inference time, thus enabling robust and accurate 4D reconstruction of long videos. Extensive experiments across multiple benchmarks show that Geo4D significantly surpasses state-of-the-art video depth estimation methods.

Zeren Jiang, Chuanxia Zheng, Iro Laina, Diane Larlus, Andrea Vedaldi• 2025

Related benchmarks

TaskDatasetResultRank
Camera pose estimationTUM dynamics
RRE0.48
57
Video Depth EstimationTUM dynamics
Abs Rel0.175
27
Geometric ReconstructionDDAD (test)
Relp14.58
8
Geometric ReconstructionMonkaa (test)
Relp28.04
8
Geometric ReconstructionSintel (test)
Relp34.61
8
Camera pose estimationBonn 3 scenes
ATE36.7
5
Video Depth EstimationBEDLAM
Abs Rel0.058
5
Video Depth EstimationBonn 3 scenes
Abs Rel0.087
5
Camera pose estimationGTA-IM
ATE0.107
5
Video Depth EstimationGTA-IM
Abs Rel0.218
5
Showing 10 of 12 rows

Other info

Follow for update