Sparse Code Uplifting for Efficient 3D Language Gaussian Splatting

About

3D Language Gaussian Splatting (3DLGS) augments 3D Gaussian Splatting with language-aligned visual features for open-vocabulary 3D scene understanding. A core challenge is efficiently associating high-dimensional vision-language embeddings with millions of 3D Gaussians while preserving efficient feature rendering for text-based querying. Existing methods either store dense features directly on Gaussians, causing high storage costs and slow rendering, or learn compact representations through expensive per-scene optimization with repeated feature rasterization. No existing method simultaneously achieves fast 3D semantic reconstruction, efficient storage, and fast rendering. We propose SCOUP (Sparse COde UPlifting), which addresses all three by decoupling language representation learning from 3D Gaussian optimization. Rather than working directly in 3D, we learn sparse codebook-based representations entirely using features associated with 2D image regions, associating each region with a sparse set of codebook coefficients. We then uplift these coefficients to 3D Gaussians with our weighted sparse aggregation using Gaussian-to-pixel associations, where each Gaussian accumulates coefficients over codebook atoms across views. Top-$K$ filtering then extracts the most dominant multi-view coefficients per Gaussian, enabling efficient storage and fast rendering. Our method achieves up to $400\times$ training speedup while being $3\times$ more memory efficient during training compared to the state-of-the-art in rendering speed. Across multiple benchmarks, SCOUP matches or outperforms existing methods in open-vocabulary querying accuracy.

Lovre Antonio Budimir, Yushi Guan, Steve Ryhner, Sven Lon\v{c}ari\'c, Nandita Vijaykumar• 2026

Related benchmarks

Task	Dataset	Result
3D Semantic Segmentation	3D-OVS	Bed87.7	55
3D Object Localization	LERF	Ramen Success Rate74.7	14
3D Semantic Segmentation	LERF	mIoU (Ramen)57.6	9
3D Semantic Segmentation	Mip-NeRF360	Room Accuracy66.5	5
Semantic 3D Reconstruction	Mip-NeRF 360	Reconstruction Time (2D to 3D)1.4	2
Semantic 3D Reconstruction	3D-OVS	2D to 3D Time (min)0.5	2

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord