RoDyn-SLAM: Robust Dynamic Dense RGB-D SLAM with Neural Radiance Fields
About
Leveraging neural implicit representation to conduct dense RGB-D SLAM has been studied in recent years. However, this approach relies on a static environment assumption and does not work robustly within a dynamic environment due to the inconsistent observation of geometry and photometry. To address the challenges presented in dynamic environments, we propose a novel dynamic SLAM framework with neural radiance field. Specifically, we introduce a motion mask generation method to filter out the invalid sampled rays. This design effectively fuses the optical flow mask and semantic mask to enhance the precision of motion mask. To further improve the accuracy of pose estimation, we have designed a divide-and-conquer pose optimization algorithm that distinguishes between keyframes and non-keyframes. The proposed edge warp loss can effectively enhance the geometry constraints between adjacent frames. Extensive experiments are conducted on the two challenging datasets, and the results show that RoDyn-SLAM achieves state-of-the-art performance among recent neural RGB-D methods in both accuracy and robustness.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Tracking | TUM RGB-D 44 (various sequences) | Average Error5.26 | 28 | |
| Tracking | TUM 8 dynamic scenes | f3 Walk Scale/Translation Error1.7 | 28 | |
| Camera Tracking | BONN dynamic sequences | Balloon Error7.9 | 25 | |
| Tracking | Bonn RGB-D dataset | Balloon211.5 | 23 | |
| Camera Tracking | TUM dynamic scene sequences RGB-D (test) | f3/w_s ATE (cm)1.7 | 17 | |
| Tracking | TUM-RGBD (various sequences) | Average Translational Error5.26 | 16 | |
| Camera Tracking | TUM dynamic scene sequences | ATE Component w_x (f3)8.3 | 15 | |
| Tracking Accuracy | BONN | bal1 Score7.9 | 8 | |
| Pose Estimation | TUM | s.s Error1.5 | 8 | |
| 3D Mapping | Bonn person_tracking | Accuracy (CM)10.2 | 4 |