Self-Supervised Monocular Depth Estimation with Internal Feature Fusion

About

Self-supervised learning for depth estimation uses geometry in image sequences for supervision and shows promising results. Like many computer vision tasks, depth network performance is determined by the capability to learn accurate spatial and semantic representations from images. Therefore, it is natural to exploit semantic segmentation networks for depth estimation. In this work, based on a well-developed semantic segmentation network HRNet, we propose a novel depth estimation network DIFFNet, which can make use of semantic information in down and upsampling procedures. By applying feature fusion and an attention mechanism, our proposed method outperforms the state-of-the-art monocular depth estimation methods on the KITTI benchmark. Our method also demonstrates greater potential on higher resolution training data. We propose an additional extended evaluation strategy by establishing a test set of challenging cases, empirically derived from the standard benchmark.

Hang Zhou, David Greenwood, Sarah Taylor• 2021

Related benchmarks

Task	Dataset	Result
Monocular Depth Estimation	KITTI (Eigen)	Abs Rel0.097	523
Depth Estimation	KITTI (Eigen split)	RMSE4.345	291
Monocular Depth Estimation	KITTI (Eigen split)	Abs Rel0.094	215
Monocular Depth Estimation	Make3D (test)	Abs Rel0.298	132
Monocular Depth Estimation	KITTI improved ground truth (Eigen split)	Abs Rel0.066	65
Monocular Depth Estimation	KITTI Eigen (test)	AbsRel0.102	56
Depth Estimation	KITTI improved dense ground truth	Abs Rel0.076	29
Monocular Depth Estimation	KITTI Raw (Eigen)	Abs Rel9.7	23
Monocular Depth Estimation	DDAD	Abs Rel Error0.205	21
Monocular Depth Estimation	Waymo Open Dataset	AbsRel0.149	15

Showing 10 of 17 rows

Other info

Code

Follow for update

@wizwand_team Discord