Scene Coordinate Reconstruction: Posing of Image Collections via Incremental Learning of a Relocalizer
About
We address the task of estimating camera parameters from a set of images depicting a scene. Popular feature-based structure-from-motion (SfM) tools solve this task by incremental reconstruction: they repeat triangulation of sparse 3D points and registration of more camera views to the sparse point cloud. We re-interpret incremental structure-from-motion as an iterated application and refinement of a visual relocalizer, that is, of a method that registers new views to the current state of the reconstruction. This perspective allows us to investigate alternative visual relocalizers that are not rooted in local feature matching. We show that scene coordinate regression, a learning-based relocalization approach, allows us to build implicit, neural scene representations from unposed images. Different from other learning-based reconstruction methods, we do not require pose priors nor sequential inputs, and we optimize efficiently over thousands of images. In many cases, our method, ACE0, estimates camera poses with an accuracy close to feature-based SfM, as demonstrated by novel view synthesis. Project page: https://nianticlabs.github.io/acezero/
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Structure-from-Motion | Tanks&Temples | Registration Score1 | 15 | |
| Multi-View Pose Estimation | Tanks&Temples 200-view | RRA@555.7 | 9 | |
| Multi-View Pose Estimation | Tanks&Temples 100-view | RRA@527.3 | 9 | |
| Multi-View Pose Estimation | Tanks&Temples 25-view | RRA@51.2 | 9 | |
| Multi-View Pose Estimation | Tanks&Temples 50-view | RRA@511.9 | 9 | |
| Multi-View Pose Estimation | Tanks&Temples (full sequence) | Registration Error100 | 8 | |
| Structure-from-Motion | ETH3D 59 (test) | RRA (@5°)16.4 | 7 |