Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

TTSA3R: Training-Free Temporal-Spatial Adaptive Persistent State for Streaming 3D Reconstruction

About

Streaming recurrent models enable efficient 3D reconstruction by maintaining persistent state representations. However, they suffer from catastrophic forgetting over long sequences due to balancing historical information with new observations. Recent methods alleviate this by deriving adaptive signals from attention perspective, but they operate on single dimensions without considering temporal and spatial consistency. To this end, we propose a training-free framework termed TTSA3R that leverages both temporal state evolution and spatial observation quality for adaptive state updates in 3D reconstruction. In particular, we devise a Temporal Adaptive Update Module that regulates update magnitude by analyzing temporal state evolution patterns. Then, a Spatial Contextual Update Module is introduced to localize spatial regions that require updates through observation-state alignment and scene dynamics. These complementary signals are finally fused to determine the state updating strategies. Extensive experiments demonstrate the effectiveness of TTSA3R in diverse 3D tasks. Moreover, our method exhibits only 1.33x error increase compared to over 4x degradation in the baseline model on extended sequences of 3D reconstruction, significantly improving long-term reconstruction stability. Our codes are available at https://github.com/anonus2357/ttsa3r.

Zhijie Zheng, Xinhao Xiang, Jiawei Zhang• 2026

Related benchmarks

TaskDatasetResultRank
Camera pose estimationSintel
ATE0.21
92
Camera pose estimationScanNet
ATE RMSE (Avg.)0.057
61
Camera pose estimationTUM dynamics
RRE0.372
57
Video Depth EstimationKITTI short sequences
Abs Rel0.11
20
Video Depth EstimationSintel (short sequences)
Abs Rel0.401
20
Video Depth EstimationBonn short sequences
Abs Rel0.064
20
Showing 6 of 6 rows

Other info

Follow for update