Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory

About

Dense 3D scene reconstruction from an ordered sequence or unordered image collections is a critical step when bringing research in computer vision into practical scenarios. Following the paradigm introduced by DUSt3R, which unifies an image pair densely into a shared coordinate system, subsequent methods maintain an implicit memory to achieve dense 3D reconstruction from more images. However, such implicit memory is limited in capacity and may suffer from information loss of earlier frames. We propose Point3R, an online framework targeting dense streaming 3D reconstruction. To be specific, we maintain an explicit spatial pointer memory directly associated with the 3D structure of the current scene. Each pointer in this memory is assigned a specific 3D position and aggregates scene information nearby in the global coordinate system into a changing spatial feature. Information extracted from the latest frame interacts explicitly with this pointer memory, enabling dense integration of the current observation into the global coordinate system. We design a 3D hierarchical position embedding to promote this interaction and design a simple yet effective fusion mechanism to ensure that our pointer memory is uniform and efficient. Our method achieves competitive or state-of-the-art performance on various tasks with low training costs. Code: https://github.com/YkiWu/Point3R.

Yuqi Wu, Wenzhao Zheng, Jie Zhou, Jiwen Lu• 2025

Related benchmarks

TaskDatasetResultRank
Video Depth EstimationSintel
Relative Error (Rel)0.452
109
Video Depth EstimationBONN
Relative Error (Rel)0.06
103
Camera pose estimationSintel
ATE0.351
92
Camera pose estimationScanNet
ATE RMSE (Avg.)0.106
61
Camera pose estimationTUM dynamics
RRE0.642
57
Video Depth EstimationSintel (test)
Delta 1 Accuracy48.9
57
Video Depth EstimationKITTI
Abs Rel0.136
47
Camera Localization7 Scenes
Average Position Error (m)0.439
46
3D ReconstructionNeural RGB-D (NRGBD)
Acc Mean0.113
38
Video Depth EstimationBonn (test)
Abs Rel0.06
37
Showing 10 of 18 rows

Other info

Follow for update