Temporally Consistent Online Depth Estimation Using Point-Based Fusion

About

Depth estimation is an important step in many computer vision problems such as 3D reconstruction, novel view synthesis, and computational photography. Most existing work focuses on depth estimation from single frames. When applied to videos, the result lacks temporal consistency, showing flickering and swimming artifacts. In this paper we aim to estimate temporally consistent depth maps of video streams in an online setting. This is a difficult problem as future frames are not available and the method must choose between enforcing consistency and correcting errors from previous estimations. The presence of dynamic objects further complicates the problem. We propose to address these challenges by using a global point cloud that is dynamically updated each frame, along with a learned fusion approach in image space. Our approach encourages consistency while simultaneously allowing updates to handle errors and dynamic objects. Qualitative and quantitative results show that our method achieves state-of-the-art quality for consistent video depth estimation.

Numair Khan, Eric Penner, Douglas Lanman, Lei Xiao• 2023

Related benchmarks

Task	Dataset	Result
Surface Reconstruction	ADT dataset	Accuracy60.3	10
Surface Reconstruction	ASE (val)	Accuracy34.9	10
Monocular Depth Estimation	ScanNet monocular variant 20 60-frame sequences	OPW0.011	7
Monocular Depth Estimation	MPI Sintel Final (train)	OPW0.255	4

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord