Point-SRA: Self-Representation Alignment for 3D Representation Learning

About

Masked autoencoders (MAE) have become a dominant paradigm in 3D representation learning, setting new performance benchmarks across various downstream tasks. Existing methods with fixed mask ratio neglect multi-level representational correlations and intrinsic geometric structures, while relying on point-wise reconstruction assumptions that conflict with the diversity of point cloud. To address these issues, we propose a 3D representation learning method, termed Point-SRA, which aligns representations through self-distillation and probabilistic modeling. Specifically, we assign different masking ratios to the MAE to capture complementary geometric and semantic information, while the MeanFlow Transformer (MFT) leverages cross-modal conditional embeddings to enable diverse probabilistic reconstruction. Our analysis further reveals that representations at different time steps in MFT also exhibit complementarity. Therefore, a Dual Self-Representation Alignment mechanism is proposed at both the MAE and MFT levels. Finally, we design a Flow-Conditioned Fine-Tuning Architecture to fully exploit the point cloud distribution learned via MeanFlow. Point-SRA outperforms Point-MAE by 5.37% on ScanObjectNN. On intracranial aneurysm segmentation, it reaches 96.07% mean IoU for arteries and 86.87% for aneurysms. For 3D object detection, Point-SRA achieves 47.3% AP@50, surpassing MaskPoint by 5.12%.

Lintong Wei, Jian Lu, Haozhe Cheng, Jihua Zhu, Kaibing Zhang• 2026

Related benchmarks

Task	Dataset	Result
Semantic segmentation	S3DIS (Area 5)	mIOU71.8	1006
Part Segmentation	ShapeNetPart (test)	mIoU (Inst.)86.7	347
Object Classification	ScanObjectNN OBJ_BG	Accuracy95.53	248
Object Classification	ScanObjectNN PB_T50_RS	Accuracy90.77	220
Object Classification	ScanObjectNN OBJ_ONLY	Overall Accuracy93.31	186
Few-shot 3D Classification	ModelNet40 (test)	Accuracy99	92
Object Detection	ScanNet v2 (test)	AP@0.5047.4	70
3D Object Classification	ModelNet 1k points 40 (test)	Accuracy94.3	15
3D Object Classification	ModelNet 8k points 40 (test)	Accuracy0.945	11
Classification	IntrA	V Score1	9

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord