TTT3R: 3D Reconstruction as Test-Time Training

About

Modern Recurrent Neural Networks have become a competitive architecture for 3D reconstruction due to their linear-time complexity. However, their performance degrades significantly when applied beyond the training context length, revealing limited length generalization. In this work, we revisit the 3D reconstruction foundation models from a Test-Time Training perspective, framing their designs as an online learning problem. Building on this perspective, we leverage the alignment confidence between the memory state and incoming observations to derive a closed-form learning rate for memory updates, to balance between retaining historical information and adapting to new observations. This training-free intervention, termed TTT3R, substantially improves length generalization, achieving a $2\times$ improvement in global pose estimation over baselines, while operating at 20 FPS with just 6 GB of GPU memory to process thousands of images. Code is available in https://rover-xingyu.github.io/TTT3R

Xingyu Chen, Yue Chen, Yuliang Xiu, Andreas Geiger, Anpei Chen• 2025

Related benchmarks

Task	Dataset	Result
Video Depth Estimation	Sintel	Delta Threshold Accuracy (1.25)56.6	235
Camera pose estimation	TUM-dynamic	ATE0.014	205
Camera pose estimation	Sintel	ATE0.201	203
Video Depth Estimation	KITTI	Abs Rel0.12	148
Camera pose estimation	ScanNet	RPE (t)0.021	133
Video Depth Estimation	BONN	AbsRel5.4	131
3D Reconstruction	7 Scenes	Accuracy Median8	128
Camera pose estimation	CO3D v2	AUC@3069.46	117
Video Depth Estimation	BONN	Relative Error (Rel)0.068	108
Camera pose estimation	TUM dynamics	ATE0.028	90

Showing 10 of 133 rows

...

Other info

Follow for update

@wizwand_team Discord