Unifying Correspondence, Pose and NeRF for Pose-Free Novel View Synthesis from Stereo Pairs
About
This work delves into the task of pose-free novel view synthesis from stereo pairs, a challenging and pioneering task in 3D vision. Our innovative framework, unlike any before, seamlessly integrates 2D correspondence matching, camera pose estimation, and NeRF rendering, fostering a synergistic enhancement of these tasks. We achieve this through designing an architecture that utilizes a shared representation, which serves as a foundation for enhanced 3D geometry understanding. Capitalizing on the inherent interplay between the tasks, our unified framework is trained end-to-end with the proposed training strategy to improve overall model accuracy. Through extensive evaluations across diverse indoor and outdoor scenes from two real-world datasets, we demonstrate that our approach achieves substantial improvement over previous methodologies, especially in scenarios characterized by extreme viewpoint changes and the absence of accurate camera poses.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Novel View Synthesis | RealEstate10K t=5 (test) | LPIPS0.171 | 16 | |
| Novel View Synthesis | RealEstate10K (RE10K) t=10 (test) | LPIPS0.209 | 14 | |
| Stereo Video Synthesis | RealEstate10K (test) | FVD290 | 8 | |
| Pose Estimation | RealEstate-10K (Small) | Rotation Average Error (Avg)5.471 | 7 | |
| Pose Estimation | RealEstate-10K (Avg) | Rotation Avg Error3.61 | 7 | |
| Pose Estimation | ACID Small | Rotation Avg Error (°)3.548 | 7 | |
| Pose Estimation | ACID Medium | Rotation Avg Error (°)2.573 | 7 | |
| Pose Estimation | ACID (Avg) | Rotation Avg Error (°)3.283 | 7 | |
| Pose Estimation | RealEstate-10K Medium | Rotation Average Error (Degrees)2.183 | 7 | |
| Pose Estimation | RealEstate-10K Large | Rotation Avg Error (°)1.529 | 7 |