Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MetricAnything: Scaling Metric Depth Pretraining with Noisy Heterogeneous Sources

About

Scaling has powered recent advances in vision foundation models, yet extending this paradigm to metric depth estimation remains challenging due to heterogeneous sensor noise, camera-dependent biases, and metric ambiguity in noisy cross-source 3D data. We introduce Metric Anything, a simple and scalable pretraining framework that learns metric depth from noisy, diverse 3D sources without manually engineered prompts, camera-specific modeling, or task-specific architectures. Central to our approach is the Sparse Metric Prompt, created by randomly masking depth maps, which serves as a universal interface that decouples spatial reasoning from sensor and camera biases. Using about 20M image-depth pairs spanning reconstructed, captured, and rendered 3D data across 10000 camera models, we demonstrate-for the first time-a clear scaling trend in the metric depth track. The pretrained model excels at prompt-driven tasks such as depth completion, super-resolution and Radar-camera fusion, while its distilled prompt-free student achieves state-of-the-art results on monocular depth estimation, camera intrinsics recovery, single/multi-view metric 3D reconstruction, and VLA planning. We also show that using pretrained ViT of Metric Anything as a visual encoder significantly boosts Multimodal Large Language Model capabilities in spatial intelligence. These results show that metric depth estimation can benefit from the same scaling laws that drive modern foundation models, establishing a new path toward scalable and efficient real-world metric perception. We open-source MetricAnything at http://metric-anything.github.io/metric-anything-io/ to support community research.

Baorui Ma, Jiahui Yang, Donglin Di, Xuancheng Zhang, Jianxun Cui, Hao Li, Yan Xie, Wei Chen• 2026

Related benchmarks

TaskDatasetResultRank
Robot ManipulationLIBERO
Goal Achievement88.8
494
Monocular Depth EstimationKITTI
Abs Rel5.47
161
Monocular Depth EstimationETH3D
AbsRel0.147
117
Monocular Depth EstimationDIODE
AbsRel13.9
93
Depth Super-Resolution / CompletionETH-3D (test)
AbsRel0.84
41
Depth Super-Resolution / CompletionNYU v2 (test)
AbsRel1.53
36
Depth Super-Resolution / CompletionKITTI (test)
AbsRel2.34
36
Monocular Depth EstimationiBIMS-1
ARel0.0947
32
Monocular Depth EstimationBooster
δ159.5
26
Monocular Depth EstimationSintel
Abs Rel0.792
21
Showing 10 of 22 rows

Other info

GitHub

Follow for update