SQLdepth: Generalizable Self-Supervised Fine-Structured Monocular Depth Estimation

About

Recently, self-supervised monocular depth estimation has gained popularity with numerous applications in autonomous driving and robotics. However, existing solutions primarily seek to estimate depth from immediate visual features, and struggle to recover fine-grained scene details with limited generalization. In this paper, we introduce SQLdepth, a novel approach that can effectively learn fine-grained scene structures from motion. In SQLdepth, we propose a novel Self Query Layer (SQL) to build a self-cost volume and infer depth from it, rather than inferring depth from feature maps. The self-cost volume implicitly captures the intrinsic geometry of the scene within a single frame. Each individual slice of the volume signifies the relative distances between points and objects within a latent space. Ultimately, this volume is compressed to the depth map via a novel decoding approach. Experimental results on KITTI and Cityscapes show that our method attains remarkable state-of-the-art performance (AbsRel = $0.082$ on KITTI, $0.052$ on KITTI with improved ground-truth and $0.106$ on Cityscapes), achieves $9.9\%$, $5.5\%$ and $4.5\%$ error reduction from the previous best. In addition, our approach showcases reduced training complexity, computational efficiency, improved generalization, and the ability to recover fine-grained scene details. Moreover, the self-supervised pre-trained and metric fine-tuned SQLdepth can surpass existing supervised methods by significant margins (AbsRel = $0.043$, $14\%$ error reduction). self-matching-oriented relative distance querying in SQL improves the robustness and zero-shot generalization capability of SQLdepth. Code and the pre-trained weights will be publicly available. Code is available at \href{https://github.com/hisfog/SQLdepth-Impl}{https://github.com/hisfog/SQLdepth-Impl}.

Youhong Wang, Yunji Liang, Hao Xu, Shaohui Jiao, Hongkai Yu• 2023

Related benchmarks

Task	Dataset	Result
Monocular Depth Estimation	KITTI (Eigen)	Abs Rel0.082	523
Depth Estimation	KITTI (Eigen split)	RMSE1.65	291
Monocular Depth Estimation	KITTI (Eigen split)	Abs Rel0.091	215
Monocular Depth Estimation	Make3D (test)	Abs Rel0.306	132
Monocular Depth Estimation	KITTI Improved GT (Eigen)	AbsRel0.061	111
Monocular Depth Estimation	Cityscapes	Accuracy (delta < 1.25)88.8	74
Monocular Depth Estimation	KITTI improved ground truth (Eigen split)	Abs Rel0.052	65
Monocular Depth Estimation	KITTI Eigen (test)	AbsRel0.043	56
Monocular Depth Estimation	KITTI Raw (Eigen)	Abs Rel8.7	23
Monocular Depth Estimation	KITTI 14 (Eigen split)	Abs Rel0.075	12

Showing 10 of 11 rows

Other info

Code

Follow for update

@wizwand_team Discord