Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LoGeR: Long-Context Geometric Reconstruction with Hybrid Memory

About

Feedforward geometric foundation models achieve strong short-window reconstruction, yet scaling them to minutes-long videos is bottlenecked by quadratic attention complexity or limited effective memory in recurrent designs. We present LoGeR (Long-context Geometric Reconstruction), a novel architecture that scales dense 3D reconstruction to extremely long sequences without post-optimization. LoGeR processes video streams in chunks, leveraging strong bidirectional priors for high-fidelity intra-chunk reasoning. To manage the critical challenge of coherence across chunk boundaries, we propose a learning-based hybrid memory module. This dual-component system combines a parametric Test-Time Training (TTT) memory to anchor the global coordinate frame and prevent scale drift, alongside a non-parametric Sliding Window Attention (SWA) mechanism to preserve uncompressed context for high-precision adjacent alignment. Remarkably, this memory architecture enables LoGeR to be trained on sequences of 128 frames, and generalize up to thousands of frames during inference. Evaluated across standard benchmarks and a newly repurposed VBR dataset with sequences of up to 19k frames, LoGeR substantially outperforms prior state-of-the-art feedforward methods--reducing ATE on KITTI by over 74%--and achieves robust, globally consistent reconstruction over unprecedented horizons.

Junyi Zhang, Charles Herrmann, Junhwa Hur, Chen Sun, Ming-Hsuan Yang, Forrester Cole, Trevor Darrell, Deqing Sun• 2026

Related benchmarks

TaskDatasetResultRank
Video Depth EstimationKITTI
Abs Rel0.09
148
3D Reconstruction7 Scenes--
128
3D Geometry Estimation and ReconstructionSpatialBench Average across settings
Absolute Relative Error12.9
42
3D Geometry Estimation and ReconstructionSpatialBench Sparse
AbsRel0.077
42
3D Geometry Estimation and ReconstructionSpatialBench Medium
AbsRel0.083
42
3D Geometry Estimation and ReconstructionSpatialBench Single Frame
AbsRel0.2
42
Camera pose estimationOxford Spires
ATE18.7
26
SLAMKITTI
Error K0136.57
25
3D Geometry Estimation and ReconstructionSpatialBench Dense
AbsRel0.156
24
3D ReconstructionOxford Spires
Chamfer Distance (CD)1.92
22
Showing 10 of 21 rows

Other info

GitHub

Follow for update