S2GS: Streaming Semantic Gaussian Splatting for Online Scene Understanding and Reconstruction

About

Existing offline feed-forward methods for joint scene understanding and reconstruction on long image streams often repeatedly perform global computation over an ever-growing set of past observations, causing runtime and GPU memory to increase rapidly with sequence length and limiting scalability. We propose Streaming Semantic Gaussian Splatting (S2GS), a strictly causal, incremental 3D Gaussian semantic field framework: it does not leverage future frames and continuously updates scene geometry, appearance, and instance-level semantics without reprocessing historical frames, enabling scalable online joint reconstruction and understanding. S2GS adopts a geometry-semantic decoupled dual-backbone design: the geometry branch performs causal modeling to drive incremental Gaussian updates, while the semantic branch leverages a 2D foundation vision model and a query-driven decoder to predict segmentation masks and identity embeddings, further stabilized by query-level contrastive alignment and lightweight online association with an instance memory. Experiments show that S2GS matches or outperforms strong offline baselines on joint reconstruction-and-understanding benchmarks, while significantly improving long-horizon scalability: it processes 1,000+ frames with much slower growth in runtime and GPU memory, whereas offline global-processing baselines typically run out of memory at around 80 frames under the same setting.

Renhe Zhang, Yuyang Tan, Jingyu Gong, Zhizhong Zhang, Lizhuang Ma, Yuan Xie, Xin Tan• 2026

Related benchmarks

Task	Dataset	Result
Novel View Synthesis	Replica	PSNR15.66	205
Novel View Synthesis	ScanNet	PSNR18.71	132
Novel View Synthesis	ScanNet++	PSNR15.33	93
Semantic segmentation	ScanNet short-sequence	mIoU52.35	21
Novel View Synthesis	ScanNet short-sequence	PSNR24.9	16
Semantic segmentation	Replica	--	16
Semantic segmentation	ScanNet++	Mean IoU (mIoU)41.67	15
Temporal Instance Consistency	ScanNet short-sequence	T-mIoU44.89	12
Online Scene Understanding and Reconstruction	ScanNet 2017	Processing Time (s)0.1	7
Cross-frame Instance Consistency	ScanNet	T-mIoU26.71	3

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord