Semantically-Guided Representation Learning for Self-Supervised Monocular Depth

About

Self-supervised learning is showing great promise for monocular depth estimation, using geometry as the only source of supervision. Depth networks are indeed capable of learning representations that relate visual appearance to 3D properties by implicitly leveraging category-level patterns. In this work we investigate how to leverage more directly this semantic structure to guide geometric representation learning, while remaining in the self-supervised regime. Instead of using semantic labels and proxy losses in a multi-task approach, we propose a new architecture leveraging fixed pretrained semantic segmentation networks to guide self-supervised representation learning via pixel-adaptive convolutions. Furthermore, we propose a two-stage training process to overcome a common semantic bias on dynamic objects via resampling. Our method improves upon the state of the art for self-supervised monocular depth prediction over all pixels, fine-grained details, and per semantic categories.

Vitor Guizilini, Rui Hou, Jie Li, Rares Ambrus, Adrien Gaidon• 2020

Related benchmarks

Task	Dataset	Result
Monocular Depth Estimation	KITTI (Eigen)	Abs Rel0.1	523
Depth Estimation	KITTI (Eigen split)	RMSE4.27	291
Monocular Depth Estimation	KITTI	Abs Rel0.1	220
Monocular Depth Estimation	KITTI (Eigen split)	Abs Rel0.102	215
Monocular Depth Estimation	KITTI Raw Eigen (test)	RMSE4.27	159
Depth Prediction	KITTI original ground truth (test)	Abs Rel0.102	38
Depth Prediction	KITTI original (Eigen split)	Abs Rel0.102	29
Self-supervised Monocular Depth Estimation	KITTI (Eigen)	Absolute Relative Error (Abs Rel)11.3	24
Depth Estimation	KITTI 2015	Abs Rel0.102	21
Monocular Depth Estimation	KITTI 2015	Abs Rel0.1	14

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord