Foundry: Distilling 3D Foundation Models for the Edge
About
Foundation models pre-trained with self-supervised learning (SSL) on large-scale datasets have become powerful general-purpose feature extractors. However, their immense size and computational cost make them prohibitive for deployment on edge devices such as robots and AR/VR headsets. Existing compression techniques like standard knowledge distillation create efficient 'specialist' models but sacrifice the crucial, downstream-agnostic generality that makes foundation models so valuable. In this paper, we introduce Foundation Model Distillation (FMD), a new paradigm for compressing large SSL models into compact, efficient, and faithful proxies that retain their general-purpose representational power. We present Foundry, the first implementation of FMD for 3D point clouds. Our approach, Foundry, trains a student to learn a compressed set of SuperTokens that reconstruct the teacher's token-level representations, capturing a compact basis of its latent space. A single distilled model maintains strong transferability across diverse downstream tasks-classification, part segmentation, and few-shot scenarios-approaching full foundation-model performance while using significantly fewer tokens and FLOPs, making such models more practical for deployment on resourceconstrained hardware.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Classification | ModelNet40 | Accuracy95.2 | 108 | |
| Point Cloud Classification | ScanObjectNN PB_T50_RS | -- | 100 | |
| Point Cloud Classification | ScanObjectNN OBJ_BG | Overall Accuracy86.23 | 66 | |
| Point Cloud Classification | ScanObjectNN OBJ-ONLY | Overall Accuracy86.29 | 52 | |
| 3D Point Cloud Classification | MN40 | Accuracy91.76 | 21 | |
| 3D Point Cloud Classification | OO3D | Accuracy77.3 | 4 | |
| 3D Point Cloud Part Segmentation | SNP | mIoUC81.88 | 4 | |
| 3D Point Cloud Classification | SN55 | Accuracy89.65 | 1 |