CF3: Compact and Fast 3D Feature Fields
About
3D Gaussian Splatting (3DGS) has begun incorporating rich information from 2D foundation models. However, most approaches rely on a bottom-up optimization process that treats raw 2D features as ground truth, incurring increased computational costs. We propose a top-down pipeline for constructing compact and fast 3D Gaussian feature fields, namely, CF3. We first perform a fast weighted fusion of multi-view 2D features with pre-trained Gaussians. This approach enables training a per-Gaussian autoencoder directly on the lifted features, instead of training autoencoders in the 2D domain. As a result, the autoencoder better aligns with the feature distribution. More importantly, we introduce an adaptive sparsification method that optimizes the Gaussian attributes of the feature field while pruning and merging the redundant Gaussians, constructing an efficient representation with preserved geometric details. Our approach achieves a competitive 3D feature field using as little as 5% of the Gaussians compared to Feature-3DGS.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Segmentation | Mip-NeRF 360 | mIoU59.2 | 31 | |
| 3D Semantic Segmentation | LERF (test) | mIoU54 | 13 | |
| 3D Scene Reconstruction | LERF average across four scenes | PSNR23.84 | 12 | |
| 3D Scene Reconstruction | Mip-NeRF360 average across four scenes | PSNR27.02 | 9 | |
| 3D scene understanding | Replica (Target View) | LSeg mIoU66.3 | 5 | |
| 3D scene understanding | Replica (Source View) | LSeg mIoU66.4 | 5 | |
| Open-Vocabulary Segmentation | ScanNet Target View | LSeg mIoU37.6 | 5 | |
| Open-Vocabulary Segmentation | ScanNet Source View | LSeg mIoU39 | 5 | |
| 3D Scene Reconstruction | ScanNet Target View | MaskCLIP PSNR20.14 | 4 | |
| 3D Scene Reconstruction | ScanNet Source View | MaskCLIP PSNR23.16 | 4 |