Quantile Rendering: Efficiently Embedding High-dimensional Feature on 3D Gaussian Splatting
About
Recent advancements in computer vision have successfully extended Open-vocabulary segmentation (OVS) to the 3D domain by leveraging 3D Gaussian Splatting (3D-GS). Despite this progress, efficiently rendering the high-dimensional features required for open-vocabulary queries poses a significant challenge. Existing methods employ codebooks or feature compression, causing information loss, thereby degrading segmentation quality. To address this limitation, we introduce Quantile Rendering (Q-Render), a novel rendering strategy for 3D Gaussians that efficiently handles high-dimensional features while maintaining high fidelity. Unlike conventional volume rendering, which densely samples all 3D Gaussians intersecting each ray, Q-Render sparsely samples only those with dominant influence along the ray. By integrating Q-Render into a generalizable 3D neural network, we also propose Gaussian Splatting Network (GS-Net), which predicts Gaussian features in a generalizable manner. Extensive experiments on ScanNet and LeRF demonstrate that our framework outperforms state-of-the-art methods, while enabling real-time rendering with an approximate ~43.7x speedup on 512-D feature maps. Code will be made publicly available.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Open Vocabulary Semantic Segmentation | LERF-OVS | mIoU45.8 | 6 | |
| Open-Vocabulary 3D Semantic Segmentation | ScanNet 19 classes v2 | mIoU50.75 | 5 | |
| Open-Vocabulary 3D Semantic Segmentation | ScanNet 15 classes v2 | mIoU53.54 | 5 | |
| Open-Vocabulary 3D Semantic Segmentation | ScanNet 10 classes v2 | mIoU64.95 | 5 | |
| Open-Vocabulary 3D Semantic Segmentation | MipNeRF360 Outdoor | mIoU (Bicycle)22.36 | 2 |