Point-SRA: Self-Representation Alignment for 3D Representation Learning
About
Masked autoencoders (MAE) have become a dominant paradigm in 3D representation learning, setting new performance benchmarks across various downstream tasks. Existing methods with fixed mask ratio neglect multi-level representational correlations and intrinsic geometric structures, while relying on point-wise reconstruction assumptions that conflict with the diversity of point cloud. To address these issues, we propose a 3D representation learning method, termed Point-SRA, which aligns representations through self-distillation and probabilistic modeling. Specifically, we assign different masking ratios to the MAE to capture complementary geometric and semantic information, while the MeanFlow Transformer (MFT) leverages cross-modal conditional embeddings to enable diverse probabilistic reconstruction. Our analysis further reveals that representations at different time steps in MFT also exhibit complementarity. Therefore, a Dual Self-Representation Alignment mechanism is proposed at both the MAE and MFT levels. Finally, we design a Flow-Conditioned Fine-Tuning Architecture to fully exploit the point cloud distribution learned via MeanFlow. Point-SRA outperforms Point-MAE by 5.37% on ScanObjectNN. On intracranial aneurysm segmentation, it reaches 96.07% mean IoU for arteries and 86.87% for aneurysms. For 3D object detection, Point-SRA achieves 47.3% AP@50, surpassing MaskPoint by 5.12%.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Semantic segmentation | S3DIS (Area 5) | mIOU71.8 | 799 | |
| Part Segmentation | ShapeNetPart (test) | mIoU (Inst.)86.7 | 312 | |
| Object Classification | ScanObjectNN OBJ_BG | Accuracy95.53 | 215 | |
| Object Classification | ScanObjectNN PB_T50_RS | Accuracy90.77 | 195 | |
| Object Classification | ScanObjectNN OBJ_ONLY | Overall Accuracy93.31 | 166 | |
| Few-shot 3D Classification | ModelNet40 (test) | Accuracy99 | 92 | |
| Object Detection | ScanNet v2 (test) | AP@0.5047.4 | 70 | |
| 3D Object Classification | ModelNet 1k points 40 (test) | Accuracy94.3 | 15 | |
| 3D Object Classification | ModelNet 8k points 40 (test) | Accuracy0.945 | 11 | |
| Classification | IntrA | V Score1 | 9 |