Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Dynamic Visual SLAM using a General 3D Prior

About

Reliable incremental estimation of camera poses and 3D reconstruction is key to enable various applications including robotics, interactive visualization, and augmented reality. However, this task is particularly challenging in dynamic natural environments, where scene dynamics can severely deteriorate camera pose estimation accuracy. In this work, we propose a novel monocular visual SLAM system that can robustly estimate camera poses in dynamic scenes. To this end, we leverage the complementary strengths of geometric patch-based online bundle adjustment and recent feed-forward reconstruction models. Specifically, we propose a feed-forward reconstruction model to precisely filter out dynamic regions, while also utilizing its depth prediction to enhance the robustness of the patch-based visual SLAM. By aligning depth prediction with estimated patches from bundle adjustment, we robustly handle the inherent scale ambiguities of the batch-wise application of the feed-forward reconstruction model.

Xingguang Zhong, Liren Jin, Marija Popovi\'c, Jens Behley, Cyrill Stachniss• 2025

Related benchmarks

TaskDatasetResultRank
Video Depth EstimationSintel
Relative Error (Rel)0.287
109
Video Depth EstimationBONN
Relative Error (Rel)0.054
103
Moving Object SegmentationDAVIS Moving 2016
Jaccard Index68.1
26
Camera TrackingBONN dynamic sequences
Balloon Error2.6
25
Video Object SegmentationDAVIS 17
J Score70.6
25
Camera pose estimationSintel 14-sequence
ATE1.9
15
Camera TrackingWild-SLAM MoCap Dataset
Person Tracking Error0.2
8
Showing 7 of 7 rows

Other info

Follow for update