RM-Depth: Unsupervised Learning of Recurrent Monocular Depth in Dynamic Scenes

About

Unsupervised methods have showed promising results on monocular depth estimation. However, the training data must be captured in scenes without moving objects. To push the envelope of accuracy, recent methods tend to increase their model parameters. In this paper, an unsupervised learning framework is proposed to jointly predict monocular depth and complete 3D motion including the motions of moving objects and camera. (1) Recurrent modulation units are used to adaptively and iteratively fuse encoder and decoder features. This not only improves the single-image depth inference but also does not overspend model parameters. (2) Instead of using a single set of filters for upsampling, multiple sets of filters are devised for the residual upsampling. This facilitates the learning of edge-preserving filters and leads to the improved performance. (3) A warping-based network is used to estimate a motion field of moving objects without using semantic priors. This breaks down the requirement of scene rigidity and allows to use general videos for the unsupervised learning. The motion field is further regularized by an outlier-aware training loss. Despite the depth model just uses a single image in test time and 2.97M parameters, it achieves state-of-the-art results on the KITTI and Cityscapes benchmarks.

Tak-Wai Hui• 2023

Related benchmarks

Task	Dataset	Result
Monocular Depth Estimation	KITTI (Eigen)	Abs Rel0.105	523
Monocular Depth Estimation	KITTI	Abs Rel0.108	220
Optical Flow	KITTI 2015 (test)	--	109
Monocular Depth Estimation	KITTI Eigen split (test)	AbsRel Mean0.107	100
Monocular Depth Estimation	Cityscapes	Accuracy (delta < 1.25)91.3	74
Depth Estimation	Cityscapes (test)	--	40
Motion Segmentation	KITTI 2015 (test)	Overall IoU72.12	5

Showing 7 of 7 rows

Other info

Code

Follow for update

@wizwand_team Discord