GSO-SLAM: Bidirectionally Coupled Gaussian Splatting and Direct Visual Odometry
About
We propose GSO-SLAM, a real-time monocular dense SLAM system that leverages Gaussian scene representation. Unlike existing methods that couple tracking and mapping with a unified scene, incurring computational costs, or loosely integrate them with well-structured tracking frameworks, introducing redundancies, our method bidirectionally couples Visual Odometry (VO) and Gaussian Splatting (GS). Specifically, our approach formulates joint optimization within an Expectation-Maximization (EM) framework, enabling the simultaneous refinement of VO-derived semi-dense depth estimates and the GS representation without additional computational overhead. Moreover, we present Gaussian Splat Initialization, which utilizes image information, keyframe poses, and pixel associations from VO to produce close approximations to the final Gaussian scene, thereby eliminating the need for heuristic methods. Through extensive experiments, we validate the effectiveness of our method, showing that it not only operates in real time but also achieves state-of-the-art geometric/photometric fidelity of the reconstructed scene and tracking accuracy.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Tracking | TUM RGBD (test) | fr1/desk Error2.54 | 18 | |
| Camera Tracking | TUM RGB-D | Tracking Error (fr1/desk)2.54 | 16 | |
| Tracking | Replica (test) | Rotation Error (Rm) 00.03 | 14 | |
| Dense Reconstruction | Replica (average across eight sequences) | PSNR [dB]34.48 | 6 | |
| Dense SLAM Map Quality and Performance | TUM-RGBD (average across three sequences) | PSNR (dB)20.52 | 6 | |
| Tracking Accuracy | INS LC_3 | RMSE (m)0.44 | 4 | |
| Tracking Accuracy | INS LC_4 | RMSE (m)1 | 4 | |
| Tracking Accuracy | INS Average | RMSE (m)0.64 | 4 | |
| Tracking Accuracy | INS LC_2 | RMSE (m)0.47 | 4 |