Dense Dynamic Scene Reconstruction and Camera Pose Estimation from Multi-View Videos

About

We address the challenging problem of dense dynamic scene reconstruction and camera pose estimation from multiple freely moving cameras -- a setting that arises naturally when multiple observers capture a shared event. Prior approaches either handle only single-camera input or require rigidly mounted, pre-calibrated camera rigs, limiting their practical applicability. We propose a two-stage optimization framework that decouples the task into robust camera tracking and dense depth refinement. In the first stage, we extend single-camera visual SLAM to the multi-camera setting by constructing a spatiotemporal connection graph that exploits both intra-camera temporal continuity and inter-camera spatial overlap, enabling consistent scale and robust tracking. To ensure robustness under limited overlap, we introduce a wide-baseline initialization strategy using feed-forward reconstruction models. In the second stage, we refine depth and camera poses by optimizing dense inter- and intra-camera consistency using wide-baseline optical flow. Additionally, we introduce MultiCamRobolab, a new real-world dataset with ground-truth poses from a motion capture system. Finally, we demonstrate that our method significantly outperforms state-of-the-art feed-forward models on both synthetic and real-world benchmarks, while requiring less memory.

Shuo Sun, Unal Artan, Malcolm Mielle, Achim J. Lilienthaland, Martin Magnusson• 2026

Related benchmarks

Task	Dataset	Result
Camera Trajectory Estimation	MultiCamVideo	ATE0.005	6
Camera Trajectory Estimation	MultiCamRobolab RoboDog overlap	ATE0.011	6
Camera Trajectory Estimation	MultiCamRobolab RoboArm	ATE0.005	6
Camera Trajectory Estimation	MultiCamRobolab DynamicHuman	ATE0.013	6
Camera Trajectory Estimation	MultiCamRobolab 3-cameras	ATE0.02	5
Depth and Scene Consistency	MultiCamRobolab RoboDog_overlap	Absolute Relative Error0.011	5
Depth and Scene Consistency	MultiCamRobolab RoboArm	Abs.Rel0.059	5
Depth and Scene Consistency	MultiCamRobolab RoboDog non-overlap	Abs. Rel Error0.018	5
Depth and Scene Consistency	MultiCamRobolab DynamicHuman	Abs. Rel Error0.03	5
Camera Trajectory Estimation	MultiCamRobolab RoboDog non-overlap	ATE0.026	5

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord