Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

RayMap3R: Inference-Time RayMap for Dynamic 3D Reconstruction

About

Streaming feed-forward 3D reconstruction enables real-time joint estimation of scene geometry and camera poses from RGB images. However, without explicit dynamic reasoning, streaming models can be affected by moving objects, causing artifacts and drift. In this work, we propose RayMap3R, a training-free streaming framework for dynamic scene reconstruction. We observe that RayMap-based predictions exhibit a static-scene bias, providing an internal cue for dynamic identification. Based on this observation, we construct a dual-branch inference scheme that identifies dynamic regions by contrasting RayMap and image predictions, suppressing their interference during memory updates. We further introduce reset metric alignment and state-aware smoothing to preserve metric consistency and stabilize predicted trajectories. Our method achieves state-of-the-art performance among streaming approaches on dynamic scene reconstruction across multiple benchmarks.

Feiran Wang, Zezhou Shang, Gaowen Liu, Yan Yan• 2026

Related benchmarks

TaskDatasetResultRank
Camera pose estimationTUM-dynamic
ATE0.018
163
Camera pose estimationScanNet static indoor scenes
ATE0.064
25
Video Depth EstimationKITTI 13
Abs Rel9.8
13
Video Depth EstimationBONN 23
Abs Rel Error0.057
13
Video Depth EstimationSintel 2
Abs Rel Error0.401
13
Camera pose estimationSintel (train)
ATE0.166
10
Inference speed and GPU memoryScanNet v2 (test)
Peak Memory Usage9.2
9
Showing 7 of 7 rows

Other info

Follow for update