Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction

About

We introduce Geo4D, a method to repurpose video diffusion models for monocular 3D reconstruction of dynamic scenes. By leveraging the strong dynamic priors captured by large-scale pre-trained video models, Geo4D can be trained using only synthetic data while generalizing well to real data in a zero-shot manner. Geo4D predicts several complementary geometric modalities, namely point, disparity, and ray maps. We propose a new multi-modal alignment algorithm to align and fuse these modalities, as well as a sliding window approach at inference time, thus enabling robust and accurate 4D reconstruction of long videos. Extensive experiments across multiple benchmarks show that Geo4D significantly surpasses state-of-the-art video depth estimation methods.

Zeren Jiang, Chuanxia Zheng, Iro Laina, Diane Larlus, Andrea Vedaldi• 2025

Related benchmarks

TaskDatasetResultRank
Camera pose estimationTUM dynamics--
81
Video Depth EstimationTUM dynamics
Abs Rel0.175
53
Geometric ReconstructionDDAD (test)
Relp14.58
8
Geometric ReconstructionMonkaa (test)
Relp28.04
8
Geometric ReconstructionSintel (test)
Relp34.61
8
Camera pose estimationBonn 3 scenes
ATE36.7
5
Video Depth EstimationBEDLAM
Abs Rel0.058
5
Video Depth EstimationBonn 3 scenes
Abs Rel0.087
5
Camera pose estimationGTA-IM
ATE0.107
5
Video Depth EstimationGTA-IM
Abs Rel0.218
5
Showing 10 of 12 rows

Other info

Follow for update