Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds

About

To date, various 3D scene understanding tasks still lack practical and generalizable pre-trained models, primarily due to the intricate nature of 3D scene understanding tasks and their immense variations introduced by camera views, lighting, occlusions, etc. In this paper, we tackle this challenge by introducing a spatio-temporal representation learning (STRL) framework, capable of learning from unlabeled 3D point clouds in a self-supervised fashion. Inspired by how infants learn from visual data in the wild, we explore the rich spatio-temporal cues derived from the 3D data. Specifically, STRL takes two temporally-correlated frames from a 3D point cloud sequence as the input, transforms it with the spatial data augmentation, and learns the invariant representation self-supervisedly. To corroborate the efficacy of STRL, we conduct extensive experiments on three types (synthetic, indoor, and outdoor) of datasets. Experimental results demonstrate that, compared with supervised learning methods, the learned self-supervised representation facilitates various models to attain comparable or even better performances while capable of generalizing pre-trained models to downstream tasks, including 3D shape classification, 3D object detection, and 3D semantic segmentation. Moreover, the spatio-temporal contextual cues embedded in 3D point clouds significantly improve the learned representations.

Siyuan Huang, Yichen Xie, Song-Chun Zhu, Yixin Zhu• 2021

Related benchmarks

TaskDatasetResultRank
Semantic segmentationS3DIS (Area 5)
mIOU64.71
799
3D Object DetectionScanNet V2 (val)
mAP@0.2559.5
352
Semantic segmentationS3DIS (6-fold)
mIoU (Mean IoU)57.1
315
3D Point Cloud ClassificationModelNet40 (test)
OA93.1
297
Shape classificationModelNet40 (test)
OA93.1
255
Semantic segmentationScanNet (val)
mIoU71.03
231
Object ClassificationModelNet40 (test)
Accuracy93.1
180
3D Object DetectionSUN RGB-D (val)
mAP@0.2558.2
158
3D Object DetectionScanNet
mAP@0.2559.5
123
3D Object DetectionSUN RGB-D
mAP@0.2558.2
104
Showing 10 of 23 rows

Other info

Code

Follow for update