ScanNet++: A High-Fidelity Dataset of 3D Indoor Scenes
About
We present ScanNet++, a large-scale dataset that couples together capture of high-quality and commodity-level geometry and color of indoor scenes. Each scene is captured with a high-end laser scanner at sub-millimeter resolution, along with registered 33-megapixel images from a DSLR camera, and RGB-D streams from an iPhone. Scene reconstructions are further annotated with an open vocabulary of semantics, with label-ambiguous scenarios explicitly annotated for comprehensive semantic understanding. ScanNet++ enables a new real-world benchmark for novel view synthesis, both from high-quality RGB capture, and importantly also from commodity-level images, in addition to a new benchmark for 3D semantic scene understanding that comprehensively encapsulates diverse and ambiguous semantic labeling scenarios. Currently, ScanNet++ contains 460 scenes, 280,000 captured DSLR images, and over 3.7M iPhone RGBD frames.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Semantic segmentation | ADE20K | mIoU48.29 | 936 | |
| Monocular Depth Estimation | KITTI | Abs Rel0.0679 | 161 | |
| Monocular Depth Estimation | NYU V2 | -- | 113 | |
| Depth Estimation | ScanNet | AbsRel0.1166 | 94 | |
| Surface Normal Estimation | NYU V2 | RMSE30.57 | 23 | |
| Semantic segmentation | ScanNet++ | Average Accuracy (aAcc)84.9 | 8 | |
| Monocular Depth Estimation | ScanNet++ (val) | Relative Error (Rel)0.242 | 8 |