PointLAMA: Latent Attention meets Mamba for Efficient Point Cloud Pretraining
About
Mamba has recently gained widespread attention as a backbone model for point cloud modeling, leveraging a state-space architecture that enables efficient global sequence modeling with linear complexity. However, its lack of local inductive bias limits its capacity to capture fine-grained geometric structures in 3D data. To address this limitation, we propose \textbf{PointLAMA}, a point cloud pretraining framework that combines task-aware point cloud serialization, a hybrid encoder with integrated Latent Attention and Mamba blocks, and a conditional diffusion mechanism built upon the Mamba backbone. Specifically, the task-aware point cloud serialization employs Hilbert/Trans-Hilbert space-filling curves and axis-wise sorting to structurally align point tokens for classification and segmentation tasks, respectively. Our lightweight Latent Attention block features a Point-wise Multi-head Latent Attention (PMLA) module, which is specifically designed to align with the Mamba architecture by leveraging the shared latent space characteristics of PMLA and Mamba. This enables enhanced local context modeling while preserving overall efficiency. To further enhance representation learning, we incorporate a conditional diffusion mechanism during pretraining, which denoises perturbed feature sequences without relying on explicit point-wise reconstruction. Experimental results demonstrate that PointLAMA achieves competitive performance on multiple benchmark datasets with minimal parameter count and FLOPs, validating its effectiveness for efficient point cloud pretraining.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Few-shot classification | ModelNet40 10-way 10-shot | Accuracy94 | 79 | |
| Few-shot classification | ModelNet40 5-way 20-shot | Accuracy99 | 79 | |
| Few-shot classification | ModelNet40 10-way 20-shot | Accuracy95.8 | 79 | |
| Few-shot classification | ModelNet40 5-way 10-shot | Accuracy97.2 | 79 | |
| 3D Object Classification | ModelNet40 1k P | Accuracy94.5 | 61 | |
| 3D Object Classification | ScanObjectNN PB_T50_RS (FULL Protocol) | Accuracy89.53 | 25 | |
| 3D Object Classification | ScanObjectNN OBJ_BG (FULL Protocol) | Accuracy94.51 | 23 | |
| 3D Object Classification | ScanObjectNN OBJ_ONLY FULL Protocol | Accuracy92.86 | 23 |