S2R-DepthNet: Learning a Generalizable Depth-specific Structural Representation

About

Human can infer the 3D geometry of a scene from a sketch instead of a realistic image, which indicates that the spatial structure plays a fundamental role in understanding the depth of scenes. We are the first to explore the learning of a depth-specific structural representation, which captures the essential feature for depth estimation and ignores irrelevant style information. Our S2R-DepthNet (Synthetic to Real DepthNet) can be well generalized to unseen real-world data directly even though it is only trained on synthetic data. S2R-DepthNet consists of: a) a Structure Extraction (STE) module which extracts a domaininvariant structural representation from an image by disentangling the image into domain-invariant structure and domain-specific style components, b) a Depth-specific Attention (DSA) module, which learns task-specific knowledge to suppress depth-irrelevant structures for better depth estimation and generalization, and c) a depth prediction module (DP) to predict depth from the depth-specific representation. Without access of any real-world images, our method even outperforms the state-of-the-art unsupervised domain adaptation methods which use real-world images of the target domain for training. In addition, when using a small amount of labeled real-world data, we achieve the state-ofthe-art performance under the semi-supervised setting. The code and trained models are available at https://github.com/microsoft/S2R-DepthNet.

Xiaotian Chen, Yuwang Wang, Xuejin Chen, Wenjun Zeng• 2021

Related benchmarks

Task	Dataset	Result
Depth Estimation	KITTI (Eigen split)	RMSE3.463	291
Depth Estimation	NYU Depth V2	RMSE0.662	209
Monocular Depth Estimation	Make3D (test)	Abs Rel0.49	132
Depth Prediction	Cityscapes (test)	RMSE11.164	52
Monocular Depth Estimation	KITTI v1 (Eigen split)	Acc (δ < 1.25)79.3	15
Depth Estimation	DrivingStereo v1 (test)	Acc (< 1.25)73.7	3
Depth Estimation	nuScenes v1 (test)	delta < 1.2560.1	3

Showing 7 of 7 rows

Other info

Code

Follow for update

@wizwand_team Discord