Foundry: Distilling 3D Foundation Models for the Edge

About

Foundation models pre-trained with self-supervised learning (SSL) on large-scale datasets have become powerful general-purpose feature extractors. However, their immense size and computational cost make them prohibitive for deployment on edge devices such as robots and AR/VR headsets. Existing compression techniques like standard knowledge distillation create efficient 'specialist' models but sacrifice the crucial, downstream-agnostic generality that makes foundation models so valuable. In this paper, we introduce Foundation Model Distillation (FMD), a new paradigm for compressing large SSL models into compact, efficient, and faithful proxies that retain their general-purpose representational power. We present Foundry, the first implementation of FMD for 3D point clouds. Our approach, Foundry, trains a student to learn a compressed set of SuperTokens that reconstruct the teacher's token-level representations, capturing a compact basis of its latent space. A single distilled model maintains strong transferability across diverse downstream tasks-classification, part segmentation, and few-shot scenarios-approaching full foundation-model performance while using significantly fewer tokens and FLOPs, making such models more practical for deployment on resourceconstrained hardware.

Guillaume Letellier, Siddharth Srivastava, Fr\'ed\'eric Jurie, Gaurav Sharma• 2025

Related benchmarks

Task	Dataset	Result
Classification	ModelNet40	Accuracy95.2	108
Point Cloud Classification	ScanObjectNN PB_T50_RS	--	100
Point Cloud Classification	ScanObjectNN OBJ_BG	Overall Accuracy86.23	66
Point Cloud Classification	ScanObjectNN OBJ-ONLY	Overall Accuracy86.29	52
3D Point Cloud Classification	MN40	Accuracy91.76	21
3D Point Cloud Classification	OO3D	Accuracy77.3	4
3D Point Cloud Part Segmentation	SNP	mIoUC81.88	4
3D Point Cloud Classification	SN55	Accuracy89.65	1

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord