LangGS-SLAM: Real-Time Language-Feature Gaussian Splatting SLAM
About
In this paper, we propose a RGB-D SLAM system that reconstructs a language-aligned dense feature field while sustaining low-latency tracking and mapping. First, we introduce a Top-K Rendering pipeline, a high-throughput and semantic-distortion-free method for efficiently rendering high-dimensional feature maps. To address the resulting semantic-geometric discrepancy and mitigate the memory consumption, we further design a multi-criteria map management strategy that prunes redundant or inconsistent Gaussians while preserving scene integrity. Finally, a hybrid field optimization framework jointly refines the geometric and semantic fields under real-time constraints by decoupling their optimization frequencies according to field characteristics. The proposed system achieves superior geometric fidelity compared to geometric-only baselines and comparable semantic fidelity to offline approaches while operating at 15 FPS. Our results demonstrate that online SLAM with dense, uncompressed language-aligned feature fields is both feasible and effective, bridging the gap between 3D perception and language-based reasoning.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Tracking | TUM-RGBD Dataset | ATE RMSE2.316 | 11 | |
| Semantic segmentation | Replica | -- | 8 | |
| Semantic Fidelity | TUM-RGBD fr3 desk | Accuracy80.5 | 5 | |
| Semantic Fidelity | TUM-RGBD fr1/desk | Accuracy83.8 | 5 | |
| Semantic Fidelity | TUM-RGBD fr2 xyz | Accuracy84.2 | 5 | |
| Semantic Fidelity | TUM-RGBD | Accuracy82.8 | 5 | |
| Geometric Fidelity | Replica Dataset | PSNR35.92 | 4 | |
| Geometric Fidelity | TUM-RGBD | PSNR23.78 | 4 | |
| Tracking | Replica Dataset | ATE RMSE0.213 | 4 |