Multi-View Stereo by Temporal Nonparametric Fusion

About

We propose a novel idea for depth estimation from multi-view image-pose pairs, where the model has capability to leverage information from previous latent-space encodings of the scene. This model uses pairs of images and poses, which are passed through an encoder--decoder model for disparity estimation. The novelty lies in soft-constraining the bottleneck layer by a nonparametric Gaussian process prior. We propose a pose-kernel structure that encourages similar poses to have resembling latent spaces. The flexibility of the Gaussian process (GP) prior provides adapting memory for fusing information from previous views. We train the encoder--decoder and the GP hyperparameters jointly end-to-end. In addition to a batch method, we derive a lightweight estimation scheme that circumvents standard pitfalls in scaling Gaussian process inference, and demonstrate how our scheme can run in real-time on smart devices.

Yuxin Hou, Juho Kannala, Arno Solin• 2019

Related benchmarks

Task	Dataset	Result
3D Geometry Reconstruction	ScanNet	Accuracy7.9	54
2D Depth Estimation	ScanNet	AbsRel0.062	26
3D Scene Reconstruction	ScanNet v2 (test)	Accuracy0.162	26
Depth Estimation	TUM-RGBD	Abs Rel Error0.093	16
3D Geometry Reconstruction	ScanNet (Atlas split)	Completeness0.031	11
3D Reconstruction	TUM-RGBD	F-score17	11
Depth Estimation	ICL-NUIM	Abs Rel Error0.066	11
3D Reconstruction	ICL-NUIM	F-score32.3	11
Depth Estimation	ScanNet v2 (test)	Abs Diff0.1494	10
Depth Estimation	ScanNet keyframes v2 (test)	Abs Diff0.1494	9

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord