Masked Surfel Prediction for Self-Supervised Point Cloud Learning
About
Masked auto-encoding is a popular and effective self-supervised learning approach to point cloud learning. However, most of the existing methods reconstruct only the masked points and overlook the local geometry information, which is also important to understand the point cloud data. In this work, we make the first attempt, to the best of our knowledge, to consider the local geometry information explicitly into the masked auto-encoding, and propose a novel Masked Surfel Prediction (MaskSurf) method. Specifically, given the input point cloud masked at a high ratio, we learn a transformer-based encoder-decoder network to estimate the underlying masked surfels by simultaneously predicting the surfel positions (i.e., points) and per-surfel orientations (i.e., normals). The predictions of points and normals are supervised by the Chamfer Distance and a newly introduced Position-Indexed Normal Distance in a set-to-set manner. Our MaskSurf is validated on six downstream tasks under three fine-tuning strategies. In particular, MaskSurf outperforms its closest competitor, Point-MAE, by 1.2\% on the real-world dataset of ScanObjectNN under the OBJ-BG setting, justifying the advantages of masked surfel prediction over masked point cloud reconstruction. Codes will be available at https://github.com/YBZh/MaskSurf.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Object Classification | ModelNet40 (test) | -- | 302 | |
| Shape classification | ModelNet40 (test) | -- | 255 | |
| 3D Object Part Segmentation | ShapeNet Part (test) | -- | 114 | |
| Few-shot classification | ModelNet40 5-way 20-shot | Accuracy98.3 | 79 | |
| Few-shot classification | ModelNet40 5-way 10-shot | Accuracy96.8 | 79 | |
| Few-shot classification | ModelNet40 10-way 20-shot | Accuracy95 | 79 | |
| Few-shot classification | ModelNet40 10-way 10-shot | Accuracy92.3 | 79 | |
| 3D Object Classification | ScanObjectNN PB_T50_RS | OA85.7 | 72 | |
| 3D Object Classification | ScanObjectNN OBJ_ONLY | Overall Accuracy89.2 | 69 | |
| Point Cloud Classification | ScanObjectNN OBJ_BG | Overall Accuracy91.2 | 64 |