Towards Zero-Shot Scale-Aware Monocular Depth Estimation

About

Monocular depth estimation is scale-ambiguous, and thus requires scale supervision to produce metric predictions. Even so, the resulting models will be geometry-specific, with learned scales that cannot be directly transferred across domains. Because of that, recent works focus instead on relative depth, eschewing scale in favor of improved up-to-scale zero-shot transfer. In this work we introduce ZeroDepth, a novel monocular depth estimation framework capable of predicting metric scale for arbitrary test images from different domains and camera parameters. This is achieved by (i) the use of input-level geometric embeddings that enable the network to learn a scale prior over objects; and (ii) decoupling the encoder and decoder stages, via a variational latent representation that is conditioned on single frame information. We evaluated ZeroDepth targeting both outdoor (KITTI, DDAD, nuScenes) and indoor (NYUv2) benchmarks, and achieved a new state-of-the-art in both settings using the same pre-trained model, outperforming methods that train on in-domain data and require test-time scaling to produce metric estimates.

Vitor Guizilini, Igor Vasiljevic, Dian Chen, Rares Ambrus, Adrien Gaidon• 2023

Related benchmarks

Task	Dataset	Result
Monocular Depth Estimation	NYU v2 (test)	Abs Rel0.081	320
Monocular Depth Estimation	KITTI (Eigen split)	Abs Rel0.102	215
Monocular Depth Estimation	Sintel	Abs Rel0.703	127
Monocular Depth Estimation	DDAD (test)	RMSE6.318	122
Monocular Depth Estimation	KITTI (test)	Abs Rel Error0.064	114
Monocular Depth Estimation	KITTI Eigen split (test)	AbsRel Mean10.2	100
Metric Depth Estimation	NYU Metric Depth v2 (test)	Delta 1 Accuracy95.4	33
Monocular Depth Estimation	SUN-RGBD (test)	AbsRel0.121	31
Metric Depth Estimation	KITTI in-domain (test)	Acc (δ < 1.25)96.8	27
Monocular Depth Estimation	Diode Indoor (test)	A.Rel0.309	25

Showing 10 of 19 rows

Other info

Follow for update

@wizwand_team Discord