Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Self-Supervised Monocular Depth Estimation with Internal Feature Fusion

About

Self-supervised learning for depth estimation uses geometry in image sequences for supervision and shows promising results. Like many computer vision tasks, depth network performance is determined by the capability to learn accurate spatial and semantic representations from images. Therefore, it is natural to exploit semantic segmentation networks for depth estimation. In this work, based on a well-developed semantic segmentation network HRNet, we propose a novel depth estimation network DIFFNet, which can make use of semantic information in down and upsampling procedures. By applying feature fusion and an attention mechanism, our proposed method outperforms the state-of-the-art monocular depth estimation methods on the KITTI benchmark. Our method also demonstrates greater potential on higher resolution training data. We propose an additional extended evaluation strategy by establishing a test set of challenging cases, empirically derived from the standard benchmark.

Hang Zhou, David Greenwood, Sarah Taylor• 2021

Related benchmarks

TaskDatasetResultRank
Monocular Depth EstimationKITTI (Eigen)
Abs Rel0.097
502
Depth EstimationKITTI (Eigen split)
RMSE4.345
276
Monocular Depth EstimationKITTI (Eigen split)
Abs Rel0.094
193
Monocular Depth EstimationMake3D (test)
Abs Rel0.298
132
Monocular Depth EstimationKITTI improved ground truth (Eigen split)
Abs Rel0.066
65
Depth EstimationKITTI improved dense ground truth
Abs Rel0.076
29
Monocular Depth EstimationKITTI Raw (Eigen)
Abs Rel9.7
23
Monocular Depth EstimationDDAD
Abs Rel Error0.205
17
Depth EstimationDrivingStereo Cloudy
AbsRel14
14
Depth EstimationDrivingStereo Rainy
AbsRel0.191
14
Showing 10 of 15 rows

Other info

Code

Follow for update