NRGS-SLAM: Monocular Non-Rigid SLAM for Endoscopy via Deformation-Aware 3D Gaussian Splatting
About
Visual simultaneous localization and mapping (V-SLAM) is a fundamental capability for autonomous perception and navigation. However, endoscopic scenes violate the rigidity assumption due to persistent soft-tissue deformations, creating a strong coupling ambiguity between camera ego-motion and intrinsic deformation. Although recent monocular non-rigid SLAM methods have made notable progress, they often lack effective decoupling mechanisms and rely on sparse or low-fidelity scene representations, which leads to tracking drift and limited reconstruction quality. To address these limitations, we propose NRGS-SLAM, a monocular non-rigid SLAM system for endoscopy based on 3D Gaussian Splatting. To resolve the coupling ambiguity, we introduce a deformation-aware 3D Gaussian map that augments each Gaussian primitive with a learnable deformation probability, optimized via a Bayesian self-supervision strategy without requiring external non-rigidity labels. Building on this representation, we design a deformable tracking module that performs robust coarse-to-fine pose estimation by prioritizing low-deformation regions, followed by efficient per-frame deformation updates. A carefully designed deformable mapping module progressively expands and refines the map, balancing representational capacity and computational efficiency. In addition, a unified robust geometric loss incorporates external geometric priors to mitigate the inherent ill-posedness of monocular non-rigid SLAM. Extensive experiments on multiple public endoscopic datasets demonstrate that NRGS-SLAM achieves more accurate camera pose estimation (up to 50\% reduction in RMSE) and higher-quality photo-realistic reconstructions than state-of-the-art methods. Comprehensive ablation studies further validate the effectiveness of our key design choices. Source code will be publicly available upon paper acceptance.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Camera Localization | StereoMIS (P2-2) | RMSE10.24 | 16 | |
| Camera Localization | StereoMIS (P2-4) | RMSE9.45 | 16 | |
| Camera Localization | StereoMIS Average | RMSE6.78 | 16 | |
| Camera Localization | StereoMIS (P2-3) | RMSE0.003 | 16 | |
| Camera Localization | StereoMIS (P2-5) | RMSE7.41 | 14 | |
| Camera Localization | C3VD c1_descending_t4_v4 v2 | RMSE6.81 | 9 | |
| Camera Localization | C3VD v2 (c2_transverse1_t1_v4) | RMSE10.47 | 9 | |
| Camera Localization | C3VD Average v2 | RMSE8.13 | 9 | |
| Camera Localization | C3VD c1_sigmoid2_t4_v4 v2 | RMSE7.26 | 9 | |
| Camera Localization | C3VD c1_sigmoid1_t4_v4 v2 | RMSE7.96 | 8 |