An Adapter-free Fine-tuning Approach for Tuning 3D Foundation Models
About
Point cloud foundation models demonstrate strong generalization, yet adapting them to downstream tasks remains challenging in low-data regimes. Full fine-tuning often leads to overfitting and significant drift from pre-trained representations, while existing parameter-efficient fine-tuning (PEFT) methods mitigate this issue by introducing additional trainable components at the cost of increased inference-time latency. We propose Momentum-Consistency Fine-Tuning (MCFT), an adapter-free approach that bridges the gap between full and parameter-efficient fine-tuning. MCFT selectively fine-tunes a portion of the pre-trained encoder while enforcing a momentum-based consistency constraint to preserve task-agnostic representations. Unlike PEFT methods, MCFT introduces no additional representation learning parameters beyond a standard task head, maintaining the original model's parameter count and inference efficiency. We further extend MCFT with two variants: a semi-supervised framework that leverages abundant unlabeled data to enhance few-shot performance, and a pruning-based variant that improves computational efficiency through structured layer removal. Extensive experiments on object recognition and part segmentation benchmarks demonstrate that MCFT consistently outperforms prior methods, achieving a 3.30% gain in 5-shot settings and up to a 6.13% improvement with semi-supervised learning, while remaining well-suited for resource-constrained deployment.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Part Segmentation | ShapeNetPart | mIoU (Instance)89 | 246 | |
| 3D Object Classification | ModelNet40 few-shot | Accuracy82.93 | 70 | |
| Classification | ScanObjectNN | OA93.1 | 67 | |
| object recognition | ModelNet40 5-way | Accuracy98.3 | 40 | |
| object recognition | ModelNet40 10-way | Accuracy95.9 | 30 | |
| object recognition | ScanObjectNN fully-supervised (PB) | Overall Accuracy (OA)90.8 | 28 | |
| object recognition | ModelNet40 fully-supervised (test) | Overall Accuracy (OA)95.2 | 26 | |
| object recognition | ScanObjectNN fully-supervised (BG) | Overall Accuracy (OA)94.9 | 24 | |
| object recognition | ModelNet40 20-shot | Accuracy (20-shot)86.83 | 10 | |
| object recognition | ScanObjectNN OBJ_ONLY 5-shot | Accuracy61.1 | 10 |