Chorus: Multi-Teacher Pretraining for Holistic 3D Gaussian Scene Encoding
About
While 3DGS has emerged as a high-fidelity scene representation, encoding rich, general-purpose features directly from its primitives remains under-explored. We address this gap by introducing Chorus, a multi-teacher pretraining framework that learns a holistic feed-forward 3D Gaussian Splatting (3DGS) scene encoder by distilling complementary signals from 2D foundation models. Chorus employs a shared 3D encoder and teacher-specific projectors to learn from language-aligned, generalist, and object-aware teachers, encouraging a shared embedding space that captures signals from high-level semantics to fine-grained structure. We evaluate Chorus on a wide range of tasks: open-vocabulary semantic and instance segmentation, linear and decoder probing, data-efficient supervision, as well as LLM-based Q&A. Besides 3DGS, we also test Chorus on several benchmarks that only support point clouds by pretraining a variant using only Gaussian centers, colors, and estimated normals. Surprisingly, this encoder shows strong transfer and outperforms the point-cloud baseline while using 39.9 times fewer training scenes. Finally, we propose a render-and-distill adaptation that facilitates out-of-domain finetuning.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Semantic segmentation | ScanNet (val) | mIoU79.4 | 302 | |
| Semantic segmentation | ScanNet200 (val) | mIoU40.9 | 136 | |
| Instance Segmentation | ScanNet200 (val) | mAP@5033.7 | 72 | |
| 3D Instance Segmentation | ScanNet200 | mAP@0.518 | 63 | |
| Instance Segmentation | ScanNetV2 (val) | mAP@0.563.4 | 58 | |
| Semantic segmentation | ScanNet++ (val) | mIoU52.9 | 32 | |
| Instance Segmentation | ScanNet++ (val) | -- | 24 | |
| 3D Semantic Segmentation | ScanNet200 (test) | mIoU (f)24.6 | 15 | |
| 3D Semantic Segmentation | Matterport3D 160 classes (test) | f-mIoU18.7 | 8 | |
| 3D Semantic Segmentation | ScanNet++ 100 classes (test) | f-mIoU29.6 | 8 |