UniDAC: Universal Metric Depth Estimation for Any Camera
About
Monocular metric depth estimation (MMDE) is a core challenge in computer vision, playing a pivotal role in real-world applications that demand accurate spatial understanding. Although prior works have shown promising zero-shot performance in MMDE, they often struggle with generalization across diverse camera types, such as fisheye and $360^\circ$ cameras. Recent advances have addressed this through unified camera representations or canonical representation spaces, but they require either including large-FoV camera data during training or separately trained models for different domains. We propose UniDAC, an MMDE framework that presents universal robustness in all domains and generalizes across diverse cameras using a single model. We achieve this by decoupling metric depth estimation into relative depth prediction and spatially varying scale estimation, enabling robust performance across different domains. We propose a lightweight Depth-Guided Scale Estimation module that upsamples a coarse scale map to high resolution using the relative depth map as guidance to account for local scale variations. Furthermore, we introduce RoPE-$\phi$, a distortion-aware positional embedding that respects the spatial warping in Equi-Rectangular Projections (ERP) via latitude-aware weighting. UniDAC achieves state of the art (SoTA) in cross-camera generalization by consistently outperforming prior methods across all datasets.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Depth Estimation | Matterport3D | delta174.5 | 50 | |
| Depth Estimation | Pano3D GibsonV2 | Absolute Relative Error0.161 | 24 | |
| Monocular Depth Estimation | Pano3D zero-shot GV2 | δ1 Accuracy76.8 | 19 | |
| Monocular Metric Depth Estimation | ScanNet++ | δ191.8 | 17 | |
| Monocular Depth Estimation | ScanNet++ Fisheye | delta1 Accuracy91.8 | 14 | |
| Monocular Depth Estimation | Perspective Average of KITTI, NYU-v2, IBims-1 | Delta 1 Accuracy84.5 | 14 | |
| Monocular Metric Depth Estimation | KITTI-360 | δ1 Acc83.6 | 6 | |
| Depth Estimation | Perspective datasets: KITTI, NYU-v2, and IBims-1 (Hard samples) | δ1 Threshold Accuracy75.4 | 6 | |
| Depth Estimation | ScanNet++ (Hard samples) | Delta 1 Score78.9 | 6 | |
| Depth Estimation | Matterport3D and Pano3D-GV2 (Hard samples) | Delta-1 Accuracy54.7 | 6 |