FlowMap: High-Quality Camera Poses, Intrinsics, and Depth via Gradient Descent
About
This paper introduces FlowMap, an end-to-end differentiable method that solves for precise camera poses, camera intrinsics, and per-frame dense depth of a video sequence. Our method performs per-video gradient-descent minimization of a simple least-squares objective that compares the optical flow induced by depth, intrinsics, and poses against correspondences obtained via off-the-shelf optical flow and point tracking. Alongside the use of point tracks to encourage long-term geometric consistency, we introduce differentiable re-parameterizations of depth, intrinsics, and pose that are amenable to first-order optimization. We empirically show that camera parameters and dense depth recovered by our method enable photo-realistic novel view synthesis on 360-degree trajectories using Gaussian Splatting. Our method not only far outperforms prior gradient-descent based bundle adjustment methods, but surprisingly performs on par with COLMAP, the state-of-the-art SfM method, on the downstream task of 360-degree novel view synthesis (even though our method is purely gradient-descent based, fully differentiable, and presents a complete departure from conventional SfM).
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Structure-from-Motion | Tanks&Temples | Registration Score0.667 | 15 | |
| Multi-View Pose Estimation | Tanks&Temples 25-view | RRA@50.7 | 9 | |
| Multi-View Pose Estimation | Tanks&Temples 50-view | RRA@51.9 | 9 | |
| Multi-View Pose Estimation | Tanks&Temples 100-view | RRA@56.8 | 9 | |
| Multi-View Pose Estimation | Tanks&Temples 200-view | RRA@522.2 | 9 | |
| Multi-View Pose Estimation | Tanks&Temples (full sequence) | Registration Error66.7 | 8 | |
| Structure-from-Motion | ETH3D 59 (test) | RRA (@5°)10.1 | 7 |