Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MonoEM-GS: Monocular Expectation-Maximization Gaussian Splatting SLAM

About

Feed-forward geometric foundation models can infer dense point clouds and camera motion directly from RGB streams, providing priors for monocular SLAM. However, their predictions are often view-dependent and noisy: geometry can vary across viewpoints and under image transformations, and local metric properties may drift between frames. We present MonoEM-GS, a monocular mapping pipeline that integrates such geometric predictions into a global Gaussian Splatting representation while explicitly addressing these inconsistencies. MonoEM-GS couples Gaussian Splatting with an Expectation--Maximization formulation to stabilize geometry, and employs ICP-based alignment for monocular pose estimation. Beyond geometry, MonoEM-GS parameterizes Gaussians with multi-modal features, enabling in-place open-set segmentation and other downstream queries directly on the reconstructed map. We evaluate MonoEM-GS on 7-Scenes, TUM RGB-D and Replica, and compare against recent baselines.

Evgenii Kruzhkov, Sven Behnke• 2026

Related benchmarks

TaskDatasetResultRank
Camera TrackingTUM RGB-D
ATE RMSE (cm)12
18
Dense ReconstructionTUM RGB-D
Completion Error0.15
9
3D Semantic SegmentationReplica 3D
mIoU31.5
5
Mapping7 Scenes
Accuracy7
5
Localization7 Scenes
ATE RMSE0.08
5
Trajectory EstimationReplica 3D
ATE RMSE13.1
3
Showing 6 of 6 rows

Other info

Follow for update