Zero-Shot Metric Depth with a Field-of-View Conditioned Diffusion Model
About
While methods for monocular depth estimation have made significant strides on standard benchmarks, zero-shot metric depth estimation remains unsolved. Challenges include the joint modeling of indoor and outdoor scenes, which often exhibit significantly different distributions of RGB and depth, and the depth-scale ambiguity due to unknown camera intrinsics. Recent work has proposed specialized multi-head architectures for jointly modeling indoor and outdoor scenes. In contrast, we advocate a generic, task-agnostic diffusion model, with several advancements such as log-scale depth parameterization to enable joint modeling of indoor and outdoor scenes, conditioning on the field-of-view (FOV) to handle scale ambiguity and synthetically augmenting FOV during training to generalize beyond the limited camera intrinsics in training datasets. Furthermore, by employing a more diverse training mixture than is common, and an efficient diffusion parameterization, our method, DMD (Diffusion for Metric Depth) achieves a 25\% reduction in relative error (REL) on zero-shot indoor and 33\% reduction on zero-shot outdoor datasets over the current SOTA using only a small number of denoising steps. For an overview see https://diffusion-vision.github.io/dmd
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Monocular Depth Estimation | NYU v2 (test) | Abs Rel0.072 | 257 | |
| Monocular Depth Estimation | DDAD (test) | RMSE5.365 | 122 | |
| Depth Estimation | SUN RGB-D (test) | Root Mean Square Error (RMS)0.275 | 93 | |
| Monocular Depth Estimation | KITTI Eigen (test) | AbsRel0.053 | 46 | |
| Depth Estimation | iBims 1 (test) | REL0.118 | 41 | |
| Monocular Depth Estimation | Diode Indoor (test) | A.Rel0.291 | 25 | |
| Monocular Depth Estimation | KITTI official (val) | RMSE2.411 | 23 | |
| Monocular Depth Estimation | SUN-RGBD (test) | AbsRel0.109 | 22 | |
| Monocular Depth Estimation | Virtual KITTI 2 (test) | Delta 1 Acc89 | 22 | |
| Monocular Depth Estimation | DIODE Outdoor (test) | RMSE8.943 | 18 |