DeepV2D: Video to Depth with Differentiable Structure from Motion
About
We propose DeepV2D, an end-to-end deep learning architecture for predicting depth from video. DeepV2D combines the representation ability of neural networks with the geometric principles governing image formation. We compose a collection of classical geometric algorithms, which are converted into trainable modules and combined into an end-to-end differentiable architecture. DeepV2D interleaves two stages: motion estimation and depth estimation. During inference, motion and depth estimation are alternated and converge to accurate depth. Code is available https://github.com/princeton-vl/DeepV2D.
Zachary Teed, Jia Deng• 2018
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Depth Estimation | KITTI (Eigen split) | RMSE2.483 | 276 | |
| Monocular Depth Estimation | KITTI (test) | Abs Rel Error0.037 | 103 | |
| Depth Estimation | ScanNet (test) | Abs Rel0.057 | 65 | |
| Video Depth Estimation | Sintel (test) | Delta 1 Accuracy50.9 | 57 | |
| Visual-Inertial Odometry | EuRoC (All sequences) | MH1 Error0.739 | 51 | |
| Camera pose estimation | TUM freiburg1 | Rotation Error0.105 | 34 | |
| Visual Odometry | TUM-RGBD | freiburg1/xyz Error0.15 | 34 | |
| 3D Reconstruction | 7 Scenes | -- | 32 | |
| Video Depth Estimation | KITTI (test) | Delta197.2 | 25 | |
| Video Depth Estimation | VDW (test) | Delta 154.6 | 24 |
Showing 10 of 34 rows