DeFeat-Net: General Monocular Depth via Simultaneous Unsupervised Representation Learning

About

In the current monocular depth research, the dominant approach is to employ unsupervised training on large datasets, driven by warped photometric consistency. Such approaches lack robustness and are unable to generalize to challenging domains such as nighttime scenes or adverse weather conditions where assumptions about photometric consistency break down. We propose DeFeat-Net (Depth & Feature network), an approach to simultaneously learn a cross-domain dense feature representation, alongside a robust depth-estimation framework based on warped feature consistency. The resulting feature representation is learned in an unsupervised manner with no explicit ground-truth correspondences required. We show that within a single domain, our technique is comparable to both the current state of the art in monocular depth estimation and supervised feature representation learning. However, by simultaneously learning features, depth and motion, our technique is able to generalize to challenging domains, allowing DeFeat-Net to outperform the current state-of-the-art with around 10% reduction in all error measures on more challenging sequences such as nighttime driving.

Jaime Spencer, Richard Bowden, Simon Hadfield• 2020

Related benchmarks

Task	Dataset	Result
Monocular Depth Estimation	KITTI (Eigen)	Abs Rel0.126	523
Stereo Matching	KITTI 2015 (test)	--	233
Stereo Matching	Middlebury (test)	EPE1.244	60
Stereo Matching	Middlebury	--	53
Stereo Matching	Inria SLFD	3 Pixel Error10.47	41
Stereo Matching	HCI (test)	3PE5.13	35
Monocular Depth Estimation	SCARED	Abs Rel0.077	27
Depth Estimation	SCARED (test)	Abs Rel0.077	21
Stereo Matching	HCI	3PE12.13	6

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord