R3D3: Dense 3D Reconstruction of Dynamic Scenes from Multiple Cameras
About
Dense 3D reconstruction and ego-motion estimation are key challenges in autonomous driving and robotics. Compared to the complex, multi-modal systems deployed today, multi-camera systems provide a simpler, low-cost alternative. However, camera-based 3D reconstruction of complex dynamic scenes has proven extremely difficult, as existing solutions often produce incomplete or incoherent results. We propose R3D3, a multi-camera system for dense 3D reconstruction and ego-motion estimation. Our approach iterates between geometric estimation that exploits spatial-temporal information from multiple cameras, and monocular depth refinement. We integrate multi-camera feature correlation and dense bundle adjustment operators that yield robust geometric depth and pose estimates. To improve reconstruction where geometric depth is unreliable, e.g. for moving objects or low-textured regions, we introduce learnable scene priors via a depth refinement network. We show that this design enables a dense, consistent 3D reconstruction of challenging, dynamic outdoor environments. Consequently, we achieve state-of-the-art dense depth prediction on the DDAD and NuScenes benchmarks.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multi-camera point cloud reconstruction | Real multi-camera datasets | FPS21 | 14 | |
| Depth Estimation | nuScenes v1.0 (train val) | Absolute Relative Error (Abs Rel)0.25 | 7 | |
| 3D Scene Reconstruction | nuScenes (val) | Abs Rel0.259 | 5 | |
| Multi-camera Dynamic Scene 3D Reconstruction | Multi-camera Dynamic Scenes Structured | Temporal Stability Error0.18 | 2 | |
| Multi-camera Dynamic Scene 3D Reconstruction | Multi-camera Dynamic Scenes Complex | Temporal Stability Error (E_temp)0.24 | 2 | |
| Multi-camera Dynamic Scene 3D Reconstruction | Multi-camera Scenes (Dynamic) | Temporal Stability Error (E_temp)0.33 | 2 | |
| Multi-view 3D Reconstruction | Multi-camera Dynamic scenes | MC Error17.2 | 2 |