Self-Supervised Monocular Scene Flow Estimation

About

Scene flow estimation has been receiving increasing attention for 3D environment perception. Monocular scene flow estimation -- obtaining 3D structure and 3D motion from two temporally consecutive images -- is a highly ill-posed problem, and practical solutions are lacking to date. We propose a novel monocular scene flow method that yields competitive accuracy and real-time performance. By taking an inverse problem view, we design a single convolutional neural network (CNN) that successfully estimates depth and 3D motion simultaneously from a classical optical flow cost volume. We adopt self-supervised learning with 3D loss functions and occlusion reasoning to leverage unlabeled data. We validate our design choices, including the proxy loss and augmentation setup. Our model achieves state-of-the-art accuracy among unsupervised/self-supervised learning approaches to monocular scene flow, and yields competitive results for the optical flow and monocular depth estimation sub-tasks. Semi-supervised fine-tuning further improves the accuracy and yields promising results in real-time.

Junhwa Hur, Stefan Roth• 2020

Related benchmarks

Task	Dataset	Result
Optical Flow Estimation	KITTI 2015 (train)	Fl-epe7.51	446
Depth Estimation	KITTI (Eigen split)	RMSE4.877	291
Monocular Depth Estimation	KITTI	Abs Rel0.106	220
Optical Flow	KITTI 2015 (test)	Fl Error (All)23.54	122
Scene Flow Estimation	KITTI	EPE (m)0.454	64
Scene Flow	KITTI Scene Flow 2015 (test)	D1 Score (All)34.02	28
Scene Flow	KITTI Scene Flow (test)	D1 Error (all)22.16	25
Scene Flow	KITTI Scene Flow (train)	D1-all31.25	11
Scene Flow Estimation	KITTI 67 (test)	EPE (Endpoint Error)0.454	10
Depth Estimation	KITTI 67 (test)	AbsR-r0.1	10

Showing 10 of 27 rows

Other info

Code

Follow for update

@wizwand_team Discord