Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Sparse Code Uplifting for Efficient 3D Language Gaussian Splatting

About

3D Language Gaussian Splatting (3DLGS) augments 3D Gaussian Splatting with language-aligned visual features for open-vocabulary 3D scene understanding. A core challenge is efficiently associating high-dimensional vision-language embeddings with millions of 3D Gaussians while preserving efficient feature rendering for text-based querying. Existing methods either store dense features directly on Gaussians, causing high storage costs and slow rendering, or learn compact representations through expensive per-scene optimization with repeated feature rasterization. No existing method simultaneously achieves fast 3D semantic reconstruction, efficient storage, and fast rendering. We propose SCOUP (Sparse COde UPlifting), which addresses all three by decoupling language representation learning from 3D Gaussian optimization. Rather than working directly in 3D, we learn sparse codebook-based representations entirely using features associated with 2D image regions, associating each region with a sparse set of codebook coefficients. We then uplift these coefficients to 3D Gaussians with our weighted sparse aggregation using Gaussian-to-pixel associations, where each Gaussian accumulates coefficients over codebook atoms across views. Top-$K$ filtering then extracts the most dominant multi-view coefficients per Gaussian, enabling efficient storage and fast rendering. Our method achieves up to $400\times$ training speedup while being $3\times$ more memory efficient during training compared to the state-of-the-art in rendering speed. Across multiple benchmarks, SCOUP matches or outperforms existing methods in open-vocabulary querying accuracy.

Lovre Antonio Budimir, Yushi Guan, Steve Ryhner, Sven Lon\v{c}ari\'c, Nandita Vijaykumar• 2026

Related benchmarks

TaskDatasetResultRank
3D Semantic Segmentation3D-OVS
Bed87.7
42
3D Object LocalizationLERF
Ramen Success Rate74.7
14
3D Semantic SegmentationLERF
mIoU (Ramen)57.6
9
3D Semantic SegmentationMip-NeRF360
Room Accuracy66.5
5
Semantic 3D ReconstructionMip-NeRF 360
Reconstruction Time (2D to 3D)1.4
2
Semantic 3D Reconstruction3D-OVS
2D to 3D Time (min)0.5
2
Showing 6 of 6 rows

Other info

Follow for update