Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TTT3R: 3D Reconstruction as Test-Time Training

About

Modern Recurrent Neural Networks have become a competitive architecture for 3D reconstruction due to their linear-time complexity. However, their performance degrades significantly when applied beyond the training context length, revealing limited length generalization. In this work, we revisit the 3D reconstruction foundation models from a Test-Time Training perspective, framing their designs as an online learning problem. Building on this perspective, we leverage the alignment confidence between the memory state and incoming observations to derive a closed-form learning rate for memory updates, to balance between retaining historical information and adapting to new observations. This training-free intervention, termed TTT3R, substantially improves length generalization, achieving a $2\times$ improvement in global pose estimation over baselines, while operating at 20 FPS with just 6 GB of GPU memory to process thousands of images. Code is available in https://rover-xingyu.github.io/TTT3R

Xingyu Chen, Yue Chen, Yuliang Xiu, Andreas Geiger, Anpei Chen• 2025

Related benchmarks

TaskDatasetResultRank
Video Depth EstimationSintel
Delta Threshold Accuracy (1.25)56.6
193
Camera pose estimationSintel
ATE0.201
192
Camera pose estimationTUM-dynamic
ATE0.014
163
Video Depth EstimationKITTI
Abs Rel0.12
126
Camera pose estimationScanNet
RPE (t)0.021
119
Video Depth EstimationBONN
AbsRel5.4
116
Video Depth EstimationBONN
Relative Error (Rel)0.068
103
3D Reconstruction7 Scenes
Accuracy Median8
94
3D ReconstructionNeural RGB-D (NRGBD)
Acc Mean0.169
88
Camera pose estimationTUM dynamics
ATE0.028
81
Showing 10 of 104 rows
...

Other info

Follow for update