Dense Dynamic Scene Reconstruction and Camera Pose Estimation from Multi-View Videos
About
We address the challenging problem of dense dynamic scene reconstruction and camera pose estimation from multiple freely moving cameras -- a setting that arises naturally when multiple observers capture a shared event. Prior approaches either handle only single-camera input or require rigidly mounted, pre-calibrated camera rigs, limiting their practical applicability. We propose a two-stage optimization framework that decouples the task into robust camera tracking and dense depth refinement. In the first stage, we extend single-camera visual SLAM to the multi-camera setting by constructing a spatiotemporal connection graph that exploits both intra-camera temporal continuity and inter-camera spatial overlap, enabling consistent scale and robust tracking. To ensure robustness under limited overlap, we introduce a wide-baseline initialization strategy using feed-forward reconstruction models. In the second stage, we refine depth and camera poses by optimizing dense inter- and intra-camera consistency using wide-baseline optical flow. Additionally, we introduce MultiCamRobolab, a new real-world dataset with ground-truth poses from a motion capture system. Finally, we demonstrate that our method significantly outperforms state-of-the-art feed-forward models on both synthetic and real-world benchmarks, while requiring less memory.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Camera Trajectory Estimation | MultiCamVideo | ATE0.005 | 6 | |
| Camera Trajectory Estimation | MultiCamRobolab RoboDog overlap | ATE0.011 | 6 | |
| Camera Trajectory Estimation | MultiCamRobolab RoboArm | ATE0.005 | 6 | |
| Camera Trajectory Estimation | MultiCamRobolab DynamicHuman | ATE0.013 | 6 | |
| Camera Trajectory Estimation | MultiCamRobolab 3-cameras | ATE0.02 | 5 | |
| Depth and Scene Consistency | MultiCamRobolab RoboDog_overlap | Absolute Relative Error0.011 | 5 | |
| Depth and Scene Consistency | MultiCamRobolab RoboArm | Abs.Rel0.059 | 5 | |
| Depth and Scene Consistency | MultiCamRobolab RoboDog non-overlap | Abs. Rel Error0.018 | 5 | |
| Depth and Scene Consistency | MultiCamRobolab DynamicHuman | Abs. Rel Error0.03 | 5 | |
| Camera Trajectory Estimation | MultiCamRobolab RoboDog non-overlap | ATE0.026 | 5 |