Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Atlas: End-to-End 3D Scene Reconstruction from Posed Images

About

We present an end-to-end 3D reconstruction method for a scene by directly regressing a truncated signed distance function (TSDF) from a set of posed RGB images. Traditional approaches to 3D reconstruction rely on an intermediate representation of depth maps prior to estimating a full 3D model of a scene. We hypothesize that a direct regression to 3D is more effective. A 2D CNN extracts features from each image independently which are then back-projected and accumulated into a voxel volume using the camera intrinsics and extrinsics. After accumulation, a 3D CNN refines the accumulated features and predicts the TSDF values. Additionally, semantic segmentation of the 3D model is obtained without significant computation. This approach is evaluated on the Scannet dataset where we significantly outperform state-of-the-art baselines (deep multiview stereo followed by traditional TSDF fusion) both quantitatively and qualitatively. We compare our 3D semantic segmentation to prior methods that use a depth sensor since no previous work attempts the problem with only RGB input.

Zak Murez, Tarrence van As, James Bartolozzi, Ayan Sinha, Vijay Badrinarayanan, Andrew Rabinovich• 2020

Related benchmarks

TaskDatasetResultRank
3D Semantic SegmentationScanNet v2 (test)
mIoU34
110
3D Semantic SegmentationScanNet (test)
mIoU34
105
3D Semantic SegmentationScanNet (val)
mIoU36.8
100
3D Geometry ReconstructionScanNet
Accuracy13
54
3D Semantic Occupancy PredictionSurroundOcc-nuScenes (val)
IoU28.66
31
2D Depth EstimationScanNet
AbsRel0.061
26
3D Scene ReconstructionScanNet v2 (test)
Accuracy0.084
26
3D Semantic Occupancy PredictionnuScenes 1.0 (val)
IoU28.66
21
Depth EstimationTUM-RGBD
Abs Rel Error0.163
16
3D Semantic Occupancy PredictionSurroundOcc v1.0 (test)
IoU28.66
15
Showing 10 of 24 rows

Other info

Follow for update