Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Semantically-Guided Representation Learning for Self-Supervised Monocular Depth

About

Self-supervised learning is showing great promise for monocular depth estimation, using geometry as the only source of supervision. Depth networks are indeed capable of learning representations that relate visual appearance to 3D properties by implicitly leveraging category-level patterns. In this work we investigate how to leverage more directly this semantic structure to guide geometric representation learning, while remaining in the self-supervised regime. Instead of using semantic labels and proxy losses in a multi-task approach, we propose a new architecture leveraging fixed pretrained semantic segmentation networks to guide self-supervised representation learning via pixel-adaptive convolutions. Furthermore, we propose a two-stage training process to overcome a common semantic bias on dynamic objects via resampling. Our method improves upon the state of the art for self-supervised monocular depth prediction over all pixels, fine-grained details, and per semantic categories.

Vitor Guizilini, Rui Hou, Jie Li, Rares Ambrus, Adrien Gaidon• 2020

Related benchmarks

TaskDatasetResultRank
Monocular Depth EstimationKITTI (Eigen)
Abs Rel0.1
502
Depth EstimationKITTI (Eigen split)
RMSE4.27
276
Monocular Depth EstimationKITTI (Eigen split)
Abs Rel0.102
193
Monocular Depth EstimationKITTI
Abs Rel0.1
161
Monocular Depth EstimationKITTI Raw Eigen (test)
RMSE4.27
159
Depth PredictionKITTI original ground truth (test)
Abs Rel0.102
38
Depth PredictionKITTI original (Eigen split)
Abs Rel0.102
29
Depth EstimationKITTI 2015
Abs Rel0.102
21
Relative Monocular Depth EstimationKITTI raw (test)
Abs Rel Error0.102
8
Showing 9 of 9 rows

Other info

Follow for update