StreetSurf: Extending Multi-view Implicit Surface Reconstruction to Street Views
About
We present a novel multi-view implicit surface reconstruction technique, termed StreetSurf, that is readily applicable to street view images in widely-used autonomous driving datasets, such as Waymo-perception sequences, without necessarily requiring LiDAR data. As neural rendering research expands rapidly, its integration into street views has started to draw interests. Existing approaches on street views either mainly focus on novel view synthesis with little exploration of the scene geometry, or rely heavily on dense LiDAR data when investigating reconstruction. Neither of them investigates multi-view implicit surface reconstruction, especially under settings without LiDAR data. Our method extends prior object-centric neural surface reconstruction techniques to address the unique challenges posed by the unbounded street views that are captured with non-object-centric, long and narrow camera trajectories. We delimit the unbounded space into three parts, close-range, distant-view and sky, with aligned cuboid boundaries, and adapt cuboid/hyper-cuboid hash-grids along with road-surface initialization scheme for finer and disentangled representation. To further address the geometric errors arising from textureless regions and insufficient viewing angles, we adopt geometric priors that are estimated using general purpose monocular models. Coupled with our implementation of efficient and fine-grained multi-stage ray marching strategy, we achieve state of the art reconstruction quality in both geometry and appearance within only one to two hours of training time with a single RTX3090 GPU for each street view sequence. Furthermore, we demonstrate that the reconstructed implicit surfaces have rich potential for various downstream tasks, including ray tracing and LiDAR simulation.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Depth Synthesis | Data Night (Evaluated on Accumulated LiDAR Ground-Truth Points) (test) | RMSE10.86 | 16 | |
| Depth Synthesis | Accumulated LiDAR Ground-Truth Points Day (test) | RMSE9.6 | 16 | |
| Depth Estimation | Waymo | RMSE3.35 | 11 | |
| Surface Reconstruction | KITTI-360 static scenes 2021 | Seq 31 P->M Error (m)0.09 | 6 | |
| Surface Reconstruction | Waymo Open Dataset static scenes | P->M Error (Seq 10061, m)0.22 | 6 | |
| Surface Reconstruction | nuScenes static scenes | Seq. 0034 P->M Error (m)0.78 | 6 | |
| Surface Reconstruction | Pandaset static scenes | P->M Error (Seq 23)2.33 | 6 | |
| 3D Occupancy Reconstruction | Voxelized accumulated LiDAR pointcloud (test) | IoU5.41 | 5 | |
| Scene Reconstruction | NOTR static-32 | PSNR26.2 | 4 |