Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LangFlash: Feed-forward 3D Language Gaussian Splatting from Sparse Unposed Images

About

We present LangFlash, a feed-forward framework for 3D Language Gaussian Splatting that reconstructs 3D scenes parameterized by Gaussian primitives enriched with language-aligned semantic features from sparse unposed multi-view images. Unlike optimization-based 3D methods, LangFlash directly predicts the geometry and semantics in a single forward pass, enabling low-latency 3D reconstruction and language-consistent scene understanding. To support large-scale training, we enriched the RealEstate10k dataset with coherent and dense semantic information for 3D semantic supervision. Furthermore, we propose a sparse semantic encoding scheme that combines a global semantic dictionary with locally varying per-primitive weights, preserving high-level linguistic information, while reducing representation complexity. Experimental results show that LangFlash achieves superior novel view synthesis and semantic consistency compared with previous methods. This study establishes a new paradigm for pose-free, language-grounded 3D scene reconstruction, advancing generalizable 3D vision and multimodal scene understanding. Demo is available at https://liylo.github.io/langflash.github.io/.

Yilong Liu, Wanhua Li, Chen Zhu-Tian, Hanspeter Pfister• 2026

Related benchmarks

TaskDatasetResultRank
3D Semantic Segmentation3D-OVS
Bed67.8
42
Novel View SynthesisScanNet Target View (40 unseen scenes)
PSNR24.8
12
Open Vocabulary Semantic SegmentationScanNet Source View (40 unseen scenes)
mIoU73.44
11
Open Vocabulary Semantic SegmentationScanNet Target View (40 unseen scenes)
mIoU74.16
11
3D Semantic SegmentationRE10k (unseen)
mIoU (Bedroom)34.33
3
Showing 5 of 5 rows

Other info

Follow for update