Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

TTT3R: 3D Reconstruction as Test-Time Training

About

Modern Recurrent Neural Networks have become a competitive architecture for 3D reconstruction due to their linear-time complexity. However, their performance degrades significantly when applied beyond the training context length, revealing limited length generalization. In this work, we revisit the 3D reconstruction foundation models from a Test-Time Training perspective, framing their designs as an online learning problem. Building on this perspective, we leverage the alignment confidence between the memory state and incoming observations to derive a closed-form learning rate for memory updates, to balance between retaining historical information and adapting to new observations. This training-free intervention, termed TTT3R, substantially improves length generalization, achieving a $2\times$ improvement in global pose estimation over baselines, while operating at 20 FPS with just 6 GB of GPU memory to process thousands of images. Code available in https://rover-xingyu.github.io/TTT3R

Xingyu Chen, Yue Chen, Yuliang Xiu, Andreas Geiger, Anpei Chen• 2025

Related benchmarks

TaskDatasetResultRank
Video Depth EstimationBONN
Relative Error (Rel)0.068
103
Camera pose estimationSintel
ATE0.201
92
Camera pose estimationScanNet
ATE RMSE (Avg.)0.064
61
Camera pose estimationTUM dynamics
RRE0.38
57
Video Depth EstimationSintel (test)
Delta 1 Accuracy50
57
Camera Localization7 Scenes
Average Position Error (m)0.143
46
3D ReconstructionNeural RGB-D (NRGBD)
Acc Mean0.165
38
Video Depth EstimationBonn (test)
Abs Rel0.068
37
Object TrackingArctic Dataset
ATE RMSE (m)0.156
33
3D Reconstruction7 Scenes
Accuracy Mean6.2
32
Showing 10 of 22 rows

Other info

Follow for update