Sparse Auxiliary Networks for Unified Monocular Depth Prediction and Completion

About

Estimating scene geometry from data obtained with cost-effective sensors is key for robots and self-driving cars. In this paper, we study the problem of predicting dense depth from a single RGB image (monodepth) with optional sparse measurements from low-cost active depth sensors. We introduce Sparse Auxiliary Networks (SANs), a new module enabling monodepth networks to perform both the tasks of depth prediction and completion, depending on whether only RGB images or also sparse point clouds are available at inference time. First, we decouple the image and depth map encoding stages using sparse convolutions to process only the valid depth map pixels. Second, we inject this information, when available, into the skip connections of the depth prediction network, augmenting its features. Through extensive experimental analysis on one indoor (NYUv2) and two outdoor (KITTI and DDAD) benchmarks, we demonstrate that our proposed SAN architecture is able to simultaneously learn both tasks, while achieving a new state of the art in depth prediction by a significant margin.

Vitor Guizilini, Rares Ambrus, Wolfram Burgard, Adrien Gaidon• 2021

Related benchmarks

Task	Dataset	Result
Monocular Depth Estimation	NYU v2 (test)	Abs Rel0.106	320
Depth Completion	NYU-depth-v2 official (test)	RMSE0.12	200
Monocular Depth Estimation	DDAD (test)	RMSE11.936	122
Monocular Depth Estimation	KITTI (test)	Abs Rel Error9.12	114
Monocular Depth Estimation	KITTI Eigen split (test)	AbsRel Mean0.062	100
Depth Estimation	KITTI (official split)	Absolute Relative Error2.35	10
Monocular Depth Estimation	KITTI (official)	SILog11.54	9
Monocular Depth Estimation	DDAD DE	AUC (edges)31.52	2
Monocular Depth Estimation	Synscapes	AUC61.17	2

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord