DeepV2D: Video to Depth with Differentiable Structure from Motion
About
We propose DeepV2D, an end-to-end deep learning architecture for predicting depth from video. DeepV2D combines the representation ability of neural networks with the geometric principles governing image formation. We compose a collection of classical geometric algorithms, which are converted into trainable modules and combined into an end-to-end differentiable architecture. DeepV2D interleaves two stages: motion estimation and depth estimation. During inference, motion and depth estimation are alternated and converge to accurate depth. Code is available https://github.com/princeton-vl/DeepV2D.
Zachary Teed, Jia Deng• 2018
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Depth Estimation | KITTI (Eigen split) | RMSE2.483 | 291 | |
| Monocular Depth Estimation | KITTI (test) | Abs Rel Error0.037 | 103 | |
| 3D Reconstruction | 7 Scenes | -- | 94 | |
| Depth Estimation | ScanNet (test) | Abs Rel0.057 | 65 | |
| Visual-Inertial Odometry | EuRoC (All sequences) | MH1 Error0.739 | 62 | |
| Video Depth Estimation | Sintel (test) | Delta 1 Accuracy50.9 | 61 | |
| Visual Odometry | TUM-RGBD | freiburg1/desk2 Error0.633 | 37 | |
| Absolute Trajectory Estimation | TUM RGB-D | Desk Error0.166 | 36 | |
| Camera pose estimation | TUM freiburg1 | Rotation Error0.105 | 34 | |
| Camera pose estimation | TUM RGB-D 36 | Error (desk)0.166 | 26 |
Showing 10 of 37 rows