Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Depth Pro: Sharp Monocular Metric Depth in Less Than a Second

About

We present a foundation model for zero-shot metric monocular depth estimation. Our model, Depth Pro, synthesizes high-resolution depth maps with unparalleled sharpness and high-frequency details. The predictions are metric, with absolute scale, without relying on the availability of metadata such as camera intrinsics. And the model is fast, producing a 2.25-megapixel depth map in 0.3 seconds on a standard GPU. These characteristics are enabled by a number of technical contributions, including an efficient multi-scale vision transformer for dense prediction, a training protocol that combines real and synthetic datasets to achieve high metric accuracy alongside fine boundary tracing, dedicated evaluation metrics for boundary accuracy in estimated depth maps, and state-of-the-art focal length estimation from a single image. Extensive experiments analyze specific design choices and demonstrate that Depth Pro outperforms prior work along multiple dimensions. We release code and weights at https://github.com/apple/ml-depth-pro

Aleksei Bochkovskii, Ama\"el Delaunoy, Hugo Germain, Marcel Santos, Yichao Zhou, Stephan R. Richter, Vladlen Koltun• 2024

Related benchmarks

TaskDatasetResultRank
3D Human Pose EstimationHuman3.6M (test)--
570
Novel View SynthesisTanks&Temples (test)--
289
Video Depth EstimationSintel
Delta Threshold Accuracy (1.25)55.9
235
Monocular Depth EstimationKITTI
Abs Rel6.8
220
Depth EstimationNYU Depth V2
RMSE0.387
209
Monocular Depth EstimationNYU V2
Delta 1 Acc97.8
174
Monocular Depth EstimationETH3D
AbsRel0.327
159
Depth EstimationKITTI
RMSE3.375
156
Monocular Depth EstimationDIODE
AbsRel6.1
147
Monocular Depth EstimationSintel
Abs Rel0.508
127
Showing 10 of 222 rows
...

Other info

Follow for update