Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision

About

We present MoGe, a powerful model for recovering 3D geometry from monocular open-domain images. Given a single image, our model directly predicts a 3D point map of the captured scene with an affine-invariant representation, which is agnostic to true global scale and shift. This new representation precludes ambiguous supervision in training and facilitate effective geometry learning. Furthermore, we propose a set of novel global and local geometry supervisions that empower the model to learn high-quality geometry. These include a robust, optimal, and efficient point cloud alignment solver for accurate global shape learning, and a multi-scale local geometry loss promoting precise local geometry supervision. We train our model on a large, mixed dataset and demonstrate its strong generalizability and high accuracy. In our comprehensive evaluation on diverse unseen datasets, our model significantly outperforms state-of-the-art methods across all tasks, including monocular estimation of 3D point map, depth map, and camera field of view. Code and models can be found on our project page.

Ruicheng Wang, Sicheng Xu, Cassie Dai, Jianfeng Xiang, Yu Deng, Xin Tong, Jiaolong Yang• 2024

Related benchmarks

TaskDatasetResultRank
Novel View SynthesisTanks&Temples (test)--
257
Monocular Depth EstimationKITTI
Abs Rel0.049
203
Video Depth EstimationSintel--
193
Monocular Depth EstimationETH3D
AbsRel2.96
132
Monocular Depth EstimationNYU V2
Delta 1 Acc98.5
131
Monocular Depth EstimationDIODE
AbsRel3.23
113
Depth EstimationScanNet--
108
Monocular Depth EstimationSintel
Abs Rel0.2181
91
Novel View SynthesisScanNet++
PSNR20.82
67
Depth EstimationDIODE
Relative Error (REL)31.3
63
Showing 10 of 90 rows
...

Other info

Follow for update