Ray-Aware Pointer Memory with Adaptive Updates for Streaming 3D Reconstruction

About

Dense 3D reconstruction from continuous image streams requires both accurate geometric aggregation and stable long-term memory management. Recent feed-forward reconstruction frameworks integrate observations through persistent memory representations, yet most rely primarily on appearance-based similarity when updating memory. Such appearance-driven integration often leads to redundant accumulation of observations and unstable geometry when viewpoint changes occur. In this work, we propose a ray-aware pointer memory for streaming 3D reconstruction that explicitly models both spatial location and viewing direction within a unified memory representation. Each memory pointer stores its 3D position, associated ray direction, and feature embedding, allowing the system to reason jointly about geometric proximity and viewpoint consistency. Based on this representation, we introduce an adaptive pointer update strategy that replaces traditional fusion-based memory compression with a retain-or-replace mechanism. Instead of averaging nearby observations, the system selectively retains informative pointers while discarding redundant ones, preserving distinctive geometric structures while maintaining bounded memory growth. Furthermore, the joint reasoning over spatial distance and ray-direction discrepancy enables the system to distinguish between local redundancy, novel observations, and potential loop revisits in a unified manner. When loop candidates are detected, pose refinement is triggered to enforce global geometric consistency across the reconstruction. Extensive experiments demonstrate that the proposed ray-aware memory design significantly improves long-term reconstruction stability and camera pose accuracy while maintaining efficient streaming inference. Our approach provides a principled framework for scalable and drift-resistant online 3D reconstruction from image streams.

Feifei Li, Qi Song, Chi Zhang, Rui Huang• 2026

Related benchmarks

Task	Dataset	Result
Camera pose estimation	Sintel	ATE0.213	203
Depth Estimation	KITTI	--	184
3D Reconstruction	7 Scenes	--	161
Camera pose estimation	TUM dynamics	ATE0.049	90
3D Reconstruction	NRGBD	Accuracy Mean6.1	88
Depth Estimation	BONN	Abs Rel0.059	67
Camera pose estimation	ScanNet static indoor scenes	ATE0.086	40
Depth Estimation	Sintel	AbsRel0.376	33
Depth Estimation	NYU Static v2	Abs Rel0.073	7

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord