ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes
About
A key requirement for leveraging supervised deep learning methods is the availability of large, labeled datasets. Unfortunately, in the context of RGB-D scene understanding, very little data is available -- current datasets cover a small range of scene views and have limited semantic annotations. To address this issue, we introduce ScanNet, an RGB-D video dataset containing 2.5M views in 1513 scenes annotated with 3D camera poses, surface reconstructions, and semantic segmentations. To collect this data, we designed an easy-to-use and scalable RGB-D capture system that includes automated surface reconstruction and crowdsourced semantic annotation. We show that using this data helps achieve state-of-the-art performance on several 3D scene understanding tasks, including 3D object classification, semantic voxel labeling, and CAD model retrieval. The dataset is freely available at http://www.scan-net.org.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Instance Segmentation | COCO 2017 (val) | -- | 1144 | |
| Instance Segmentation | Cityscapes (val) | -- | 239 | |
| 3D Semantic Segmentation | ScanNet v2 (test) | mIoU30.6 | 110 | |
| 3D Semantic Segmentation | ScanNet (test) | mIoU30.6 | 105 | |
| 3D Semantic Segmentation | ScanNet v1 (test) | mAcc90.3 | 72 | |
| Semantic segmentation | ScanNet (test) | mIoU30.6 | 59 | |
| Semantic segmentation | S3DIS (test) | mIoU24.6 | 47 | |
| Instance Segmentation | ScanNet (val) | -- | 39 | |
| 3D Question Answering | ScanQA v1.0 (test) | ROUGE33.3 | 26 | |
| Spatial Reasoning | VSI-Bench | -- | 24 |