Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass
About
Multi-view 3D reconstruction remains a core challenge in computer vision, particularly in applications requiring accurate and scalable representations across diverse perspectives. Current leading methods such as DUSt3R employ a fundamentally pairwise approach, processing images in pairs and necessitating costly global alignment procedures to reconstruct from multiple views. In this work, we propose Fast 3D Reconstruction (Fast3R), a novel multi-view generalization to DUSt3R that achieves efficient and scalable 3D reconstruction by processing many views in parallel. Fast3R's Transformer-based architecture forwards N images in a single forward pass, bypassing the need for iterative alignment. Through extensive experiments on camera pose estimation and 3D reconstruction, Fast3R demonstrates state-of-the-art performance, with significant improvements in inference speed and reduced error accumulation. These results establish Fast3R as a robust alternative for multi-view applications, offering enhanced scalability without compromising reconstruction accuracy.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Monocular Depth Estimation | KITTI | Abs Rel0.12 | 161 | |
| Monocular Depth Estimation | NYU V2 | -- | 113 | |
| Video Depth Estimation | Sintel | Relative Error (Rel)0.518 | 109 | |
| Video Depth Estimation | BONN | Relative Error (Rel)0.193 | 103 | |
| Camera pose estimation | Sintel | ATE0.371 | 92 | |
| Camera pose estimation | ScanNet | ATE RMSE (Avg.)0.155 | 61 | |
| Camera pose estimation | TUM dynamics | RRE1.425 | 57 | |
| 3D Reconstruction | DTU | Accuracy Median1.706 | 47 | |
| Video Depth Estimation | KITTI | Abs Rel0.138 | 47 | |
| 3D Reconstruction | Neural RGB-D (NRGBD) | Acc Mean0.135 | 38 |