Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

WinT3R: Window-Based Streaming Reconstruction with Camera Token Pool

About

We present WinT3R, a feed-forward reconstruction model capable of online prediction of precise camera poses and high-quality point maps. Previous methods suffer from a trade-off between reconstruction quality and real-time performance. To address this, we first introduce a sliding window mechanism that ensures sufficient information exchange among frames within the window, thereby improving the quality of geometric predictions without large computation. In addition, we leverage a compact representation of cameras and maintain a global camera token pool, which enhances the reliability of camera pose estimation without sacrificing efficiency. These designs enable WinT3R to achieve state-of-the-art performance in terms of online reconstruction quality, camera pose estimation, and reconstruction speed, as validated by extensive experiments on diverse datasets. Code and model are publicly available at https://github.com/LiZizun/WinT3R.

Zizun Li, Jianjun Zhou, Yifan Wang, Haoyu Guo, Wenzheng Chang, Yang Zhou, Haoyi Zhu, Junyi Chen, Chunhua Shen, Tong He• 2025

Related benchmarks

TaskDatasetResultRank
Video Depth EstimationSintel
Delta Threshold Accuracy (1.25)53.7
193
Camera pose estimationSintel
ATE0.225
192
Camera pose estimationTUM-dynamic
ATE0.07
163
Video Depth EstimationKITTI
Abs Rel0.201
126
Camera pose estimationScanNet
RPE (t)0.02
119
Video Depth EstimationBONN
AbsRel7.1
116
3D Reconstruction7 Scenes--
94
Video Depth EstimationSintel (test)
Delta 1 Accuracy50.6
61
Camera pose estimationTUM
ATE0.074
55
Video Depth EstimationTUM dynamics
Abs Rel0.177
53
Showing 10 of 24 rows

Other info

Follow for update