Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LongStream: Long-Sequence Streaming Autoregressive Visual Geometry

About

Long-sequence streaming 3D reconstruction remains a significant open challenge. Existing autoregressive models often fail when processing long sequences because they anchor poses to the first frame, leading to attention decay, scale drift, and extrapolation errors. We introduce LongStream, a novel gauge-decoupled streaming visual geometry model for metric-scale scene reconstruction across thousands of frames under a strictly online, future-invisible setting. Our approach is threefold. First, we discard the first-frame anchor and predict keyframe-relative poses. This reformulates long-range extrapolation into a constant-difficulty local task. Second, we introduce orthogonal scale learning. This method fully disentangles geometry from scale estimation to suppress drift. Finally, we identify attention bias issues in Transformers, including attention-sink reliance and long-term KV-cache saturation. We propose cache-consistent training combined with periodic cache refresh. This approach suppresses attention biases and contamination over ultra-long sequences and reduces the gap between training and inference. Experiments show that LongStream achieves state-of-the-art performance, enabling stable, metric-scale reconstruction over kilometer-scale sequences at 18 FPS. Project Page: https://3dagentworld.github.io/longstream/

Chong Cheng, Xianda Chen, Tao Xie, Wei Yin, Weiqiang Ren, Qian Zhang, Xiaoyang Guo, Hao Wang• 2026

Related benchmarks

TaskDatasetResultRank
Video Depth EstimationKITTI
Abs Rel0.12
148
3D Reconstruction7 Scenes--
128
3D Geometry Estimation and ReconstructionSpatialBench Sparse
AbsRel0.151
42
3D Geometry Estimation and ReconstructionSpatialBench Medium
AbsRel0.166
42
3D Geometry Estimation and ReconstructionSpatialBench Average across settings
Absolute Relative Error28
42
3D Geometry Estimation and ReconstructionSpatialBench Single Frame
AbsRel0.523
42
Camera pose estimationOxford Spires
ATE19.815
26
SLAMKITTI
Error K0146.01
25
3D Geometry Estimation and ReconstructionSpatialBench Dense
AbsRel0.269
24
3D ReconstructionOxford Spires
Chamfer Distance (CD)6.28
22
Showing 10 of 24 rows

Other info

Follow for update